Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

Gadaleta, Domenico; Serrano-Candelas, Eva; Ortega-Vallbona, Rita; Colombo, Erika; Garcia de Lomana, Marina; Biava, Giada; Aparicio-Sánchez, Pablo; Roncaglioni, Alessandra; Gozalbes, Rafael; Benfenati, Emilio

doi:10.1186/s13321-024-00931-z

Research
Open access
Published: 26 December 2024

Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

Domenico Gadaleta¹^na1,
Eva Serrano-Candelas²^na1,
Rita Ortega-Vallbona²,
Erika Colombo¹,
Marina Garcia de Lomana³,
Giada Biava¹,
Pablo Aparicio-Sánchez²^nAff5,
Alessandra Roncaglioni¹,
Rafael Gozalbes^2,4 &
…
Emilio Benfenati¹

Journal of Cheminformatics volume 16, Article number: 145 (2024) Cite this article

1437 Accesses
1 Altmetric
Metrics details

Abstract

Ensuring the safety of chemicals for environmental and human health involves assessing physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and toxicity (ADMET). Computational methods play a vital role in predicting these properties, given the current trends in reducing experimental approaches, especially those that involve animal experimentation. In the present manuscript, twelve software tools implementing Quantitative Structure–Activity Relationship (QSAR) models were selected for the prediction of 17 relevant PC and TK properties. A total of 41 validation datasets were collected from the literature, curated and used for assessing the models’ external predictivity, emphasizing the performance of the models inside the applicability domain. Overall, the results confirmed the adequate predictive performance of the majority of the selected tools, with models for PC properties (R² average = 0.717) generally outperforming those for TK properties (R² average = 0.639 for regression, average balanced accuracy = 0.780 for classification). Notably, several of the tools evaluated exhibited good predictivity across different properties and were identified as recurring optimal choices. Moreover, a systematic analysis of the chemical space covered by the external validation datasets confirmed the validity of the collected results for relevant chemical categories (e.g., drugs and industrial chemicals), further increasing the confidence in the overall evaluation. The best performing models were ultimately suggested for each investigated property and proposed as robust computational tools for high-throughput assessment of highly relevant chemical properties.

Scientific contribution

The present manuscript provides an overview of the state-of-the-art available computational tools for predicting the PC and TK properties of chemicals. The results here offer valuable guidance to researchers, regulatory authorities, and the industry in identifying robust computational tools suitable for predicting relevant chemical properties in the context of chemical design, toxicity and environmental fate assessment.

Introduction

Ensuring the safety of chemicals for both the environment and human health requires a delicate balance of physicochemical (PC) and toxicokinetic (TK) properties, which are crucial for absorption, distribution, metabolism, excretion, and overall toxicity (ADMET). The optimization of TK and PC profiles is paramount in drug discovery and development, given that 40–60% of drug failures in clinical trials were reported to stem from PC and bioavailability deficiencies [1,2,3]. In this regard, the optimization of drug bioavailability is a major purpose of preclinical experimentation, aimed at eliminating weak candidates and identifying more likely to succeed drug candidates [4]. Similarly, the commercialization of pesticides, food additives, consumer products, and industrial products entails substantial health and ecological risks and contributes to environmental pollution and occupational diseases. Pesticides, in particular, have faced market restrictions due to their environmental persistence, bioaccumulation, and toxicity [5] caused by their unfavorable ADMET profile [6].

Given these considerations, effective determination of TK and PK properties is of utmost importance. While in vitro and in vivo methods have been widely employed, the impossibility of conducting experiments on a large number of compounds due to cost and time constraints [7, 8] encouraged the development and enhancement of computational methods for the prediction of chemical properties [9]. These methods have shown promising potential in predicting PC and TK properties by correlating these endpoints to molecular features through quantitative structure–activity relationship (QSAR) models [7, 10, 11].

In recent years, collaborative research efforts, such as the EU-funded ONTOX project (ontology-driven and artificial intelligence-based repeated dose toxicity testing of chemicals for next-generation risk assessment), have emerged. The ONTOX project focuses on developing new approach methodologies (NAMs) incorporating artificial intelligence (AI) to address systemic repeated-dose toxicity and enable human risk assessment. AI plays a pivotal role in effectively integrating chemical, biological, mechanistic, toxicological, epidemiological, and kinetic data collected during this project [12] and PC and TK data represent an essential part of this integrated approach.

This manuscript presents the results of a review conducted within the ONTOX project, aiming to identify suitable software implementing QSAR models for accurate predictions of relevant PC and TK properties of chemicals. Multiple datasets were collected from the literature and curated at structural and property level to validate and benchmark a selection of software programs implementing QSARs for the prediction of various PK and TK properties, with a specific emphasis on freely available predictive tools. External validation performance was considered when comparing models for a given property, incorporating crucial aspects such as the inclusion of predicted chemicals in the applicability domain (AD) and training set (TS) of each model.

Methods

Dataset selection

A literature review was performed to identify chemical datasets including experimental data for the properties of interest. A data search was performed manually using standard online search tools to search for several different scientific databases (such as Google Scholar, PubMed, Scopus, Web of Science and Dimensions).

In addition to directly searching those databases, the collection of chemical datasets was boosted by using an in-house script that applies web scraping algorithms to obtain information from API sources (such as PyMed [https://pypi.org/project/pymed/] to access the PubMed database) or by automatic (direct) access to their websites. The search terms included an exhaustive list of keywords for the specific PC and TK endpoints of interest (Table 1, column “Endpoint”) and combinations of them. These included standard abbreviations (Table 1, column “Abbreviations”), the use of regular expressions when possible and other options to avoid missing information due to capitalization, format, abbreviations or different symbols. The acronyms used within the entire manuscript and the units of the values used for each property are reported in Table 1.

Table 1 List of PC and TK properties

Full size table

Data curation

For those substances where SMILES was not reported in the original dataset, the isomeric SMILES were retrieved using the PubChem PUG (Power User Gateway) REST service (https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest) from CAS numbers and, if not available, SMILES from chemical names were considered. The SMILES representing chemical structures were further standardized and curated with an automated in-house procedure, which uses functions of RDKit Python package (https://www.rdkit.org). This procedure addresses the identification and the removal of inorganic and organometallic compounds and mixtures, of those compounds including unusual chemical elements (i.e., those different from H, C, N, O, F, Br, I, Cl, P, S, Si), the neutralization of salts, removal of duplicates at SMILES level and the standardization of chemical structures. In addition, data from different datasets associated with the same endpoint were appropriately converted to the same unit to allow comparison. Duplicated compounds inside each dataset were treated as follows:

In the case of continuous data, duplicated compounds with a standardized standard deviation (standard deviation/mean) greater than 0.2 were considered to have ambiguous values and were removed, whereas experimental values were averaged if their difference was lower than the above standard deviation.
In the case of binary classification data, only compounds with the same response values were retained.

Experimental data for the chemicals in the collected datasets were curated to exclude (1) response outliers potentially resulting from annotation errors (“intra-ourliers”), and (2) chemicals present in multiple datasets with inconsistent experimental property values (“inter-outliers”).

To identify intra-outliers, each dataset was standardized by calculating the Z-score for each data point using the formula:

$$Z score= \frac{(X-\upmu )}{\upsigma }$$

where X is the data point, μ is the mean, and σ is the standard deviation. Data points with a Z-score greater than 3 were considered outliers and removed from the datasets.

To detect ambiguous values for compounds shared across different datasets (inter-outliers), experimental property values were compared. The correlation between these values was analyzed for each dataset pair associated with a given property. Following the same criteria for duplicates inside each dataset, compounds exhibiting a standardized standard deviation greater than 0.2 across datasets for the same property were considered to have ambiguous values and were removed from all corresponding datasets, whereas experimental values were averaged if their difference was lower than the above standard deviation. Those datasets that exhibited a correlation value below 0.8 with the rest of datasets were discarded from further analysis.

An overview of the whole data collection and curation procedure is depicted in Fig. 1, whereas data distribution before and after the removal of outliers and chemicals showing inconsistent values across datasets is provided in Additional file 1.

As a result of the literature review and the curation procedure, 41 datasets were collected (21 for PC properties and 20 for TK properties). The list of datasets collected, together with the number of chemicals included before and after the curation procedure is reported in Table 2. The curated datasets are available in Additional file 2.

Table 2 List of validation datasets collected for relevant PC and TK properties

Full size table

Chemical space analysis

The applicability of the results of this analysis is strictly limited to the chemical space investigated, that is, the space covered by the dataset used for model evaluation. To obtain a better view of the utility of the collected datasets, chemicals were plotted against a reference chemical space covering the main chemical categories of real-life interest. Specifically, the reference chemical space included data from (1) the ECHA database (https://echa.europa.eu/da/information-on-chemicals/registered-substances) of substances registered under the REACH directive as representative of industrial chemicals, (2) the Drug Bank (https://go.drugbank.com/) [46] as representative of approved drugs, and (3) the Natural Products Atlas (https://www.npatlas.org/) [47] as representative of natural chemical products. Compounds were standardized, then functional connectivity circular fingerprints (FCFP) with a radius 2 folded to 1024 bits were computed using CDK (https://cdk.github.io/). Principal component analysis (PCA) with two components was applied to the descriptor matrix. Next, each of the collected datasets was plotted on the obtained two-dimensional chemical space defined by the PCA to determine the chemical categories covered during the validation.

Tool selection

A list of software implementing QSAR models for predicting PC and TK properties was identified from the literature, with the objective of validating each of them with the collected experimental data and comparing their predictive performance. During selection, freely available public software and software made available by the project’s partners were prioritized. Additional aspects considered for software selection were associated with their usability, with particular attention given to the capacity to perform batch predictions for very large datasets (i.e., several thousand compounds). This criterion led to the exclusion of some initially selected software, such as Way2Drug (http://www.way2drug.com/) and admetSAR (http://lmmd.ecust.edu.cn/admetsar2/).

Moreover, tools that allowed the evaluation of the model’s AD and whose TS was publicly available were preferred over those tools that did not provide this information.

In the end, twelve software and web services were selected for comparison, implementing approximately eighty QSAR models to predict PC and TK properties. The complete list of selected software and QSARs evaluated in this work is reported below and in Table 3.

Table 3 List of predictive software selected for the benchmark

Full size table

OPERA (Open (Quantitative) Structure–activity/property Relationship App version 2.9) from the U.S. National Institute of Environmental Health Science (NIEHS) is an open-source battery of QSAR models for predicting various PC properties, environmental fate parameters, and toxicity endpoints. AD assessment is performed using two complementary methods (leverage and vicinity of query chemicals) to identify reliable predictions [48]. A total of ten models from OPERA were validated.
VEGA (https://www.vegahub.eu/portfolio-item/vega-qsar/) version 1.2.3 is a freely available platform encompassing several QSAR models for predicting various PC, toxicological and ecotoxicological endpoints. A series of independent methods ensures the evaluation of AD through a combination of four different scores [49]. For the present work, three different models for logP implemented in VEGA were evaluated — the VEGA ALOGP (v 1.0.0), the VEGA MEYLAN KOWWIN (v 1.4.4, implemented from EPISuite) and the VEGA MLOGP (v 1.0.0) models; two different FUB models (VEGA logK and VEGA CORAL) [50]; two models for skin permeation (VEGA logKp Potts-Guy v1.0.1 and VEGA logKp ten Berge v1.0.1); the Water solubility model (IRFMN) (v1.0.2); the P-glycoprotein activity model (NIC) (v1.0.1).
The EPI (Estimation Programs Interface) Suite (https://www.epa.gov/tsca-screening-tools/epi-suitetm-estimation-program-interface) version 4.1 by the U.S. EPA and Syracuse Research Corp. is a suite of PC property and environmental fate estimation models. For the present work, the following models were used: 1) KOWWIN™ for logP estimation; 2) WSKOWWIN™, which estimates logWS by applying correction factors to logP predictions generated with the KOWWIN™; 3) MPBPWIN™, which estimates MP, BP, and logVP of organic chemicals using a combination of different techniques; 4) HENRYWIN™, which calculates logH with two models, based on the group contribution and the bound contribution methods respectively [51].
The Toxicity Estimation Software Tool (TEST) version 5.1.2 is a suite of computational models for predicting toxicological, ecotoxicological and PC endpoints through a consensus of different QSAR methodologies. For the present work, consensus models for predicting MP, BP, logWS and logVP were evaluated, integrating three different approaches, namely hierarchical clustering, group contribution and nearest neighbor methods [52].
The Online Chemical Modeling Environment (OCHEM) v.4.3.156 is a web-based platform that includes a user-contributed database of experimental data and a modeling framework aimed at guiding external users to contribute with their own computational models [53]. For the present work, the ability of the ALOGPS 2.1 model to predict logP and logWS was evaluated [54].
ADMETLab version 3.0 [55] (admetmesh.scbdd.com) is a web-based application that implements a multitask graph attention computational framework which was used for the prediction of five PC (BP, MP, logP, logWS, logD) and seven TK (Caco-2, FUB, BBB, F30%, HIA, Pgp.sub, Pgp.inh) properties of interest. ADMETLab predictions for the classification endpoint were considered inside the AD when characterized by a prediction probability higher than 0.7 or lower than 0.3.
ProtoPRED (http://www.protopred.protoqsar.com) v1.0 is a computational platform that implements proprietary QSAR models to predict a wide spectrum of chemical properties. For the present work, 13 models included in the ProtoPHYSCHEM and in the ProtoADME modules were evaluated.
The “Calculators and Predictors” tool provided by ChemAxon (https://chemaxon.com) v24.1.0 was used to predict acidic and basic pKa values [56]. The “cxcalc” command line tool was used to perform batch calculations.
pkCSM (http://www.biosig.lab.uq.edu.au/pkcsm) (accessed September, 2024) is a web-based platform that implements graph-based signature classification and regression models to predict ADMET properties [57]. For the present exercise, eight models were evaluated.
SwissADME (http://www.swissadme.ch/) (accessed September, 2024) is a website that allows the computation of PC properties, ADME parameters and the drug-likeness of small molecules to support drug discovery [58]. SwissADME was evaluated for the prediction of logP, logWS, BBB and HIA (based on the “BOILED-egg method”) [59], logKp [37], and Pgp.sub.
vNN-ADMET (https://vnnadmet.bhsai.org/vnnadmet/login.xhtml) (accessed September, 2024) is a web-based application that generates predictions for query chemicals based on the properties of the most similar chemicals included in an internal database. The website implements fifteen ADMET prediction models, in particular models for predicting BBB permeability, Pgp.inh and Pgp.sub activity were evaluated.
Bayer’s in silico ADMET platform is a proprietary platform developed by Bayer Pharma that implements tools accessible to scientists within the company to predict a variety of TK and PC endpoints in early drug discovery [60]. In the present manuscript, models for logD (pH = 7.4), human FUB and three different models for bioavailability (classifying > F30% at doses of 0.1 mg, 1 mg and 10 mg) were evaluated. The logD model was further evaluated for its ability to replicate logP experimental values.

Validation of the different tools

To assess the predictive performance of the selected QSAR models from Table 3, each model was applied to predict chemicals within the collected databases outlined in Table 2. The evaluation also considered performance metrics when restricting the assessment to chemicals within the AD of the model and those not included in the model’s TS, as external performance serves as tangible evidence of a QSAR model's real-life predictability.

Various statistical metrics were employed to evaluate models providing continuous and categorical predictions. Continuous predictions were compared against experimental values and assessed using the coefficient of determination (R²), the root-mean-square error (RMSE) and the mean absolute error (MAE). For the classification models, Cooper statistics, including specificity (SPE) and sensitivity (SEN), were utilized to measure predictivity. The balanced accuracy (BA), calculated as the average between SEN and SPE, and the Matthew correlation coefficient (MCC) were also determined to estimate overall classification performance, considering potential imbalance between predicted categories [61]. Statistical analyses were conducted using the KNIME Analytics Platform v4.7.5 [62]. Figures were obtained with the Seaborn Python package (https://seaborn.pydata.org/).

Results

Figures 2 and 3 present the validation performance of the models for predicting PC and TK properties. The detailed predictions returned by the models for each validation dataset together with information on the inclusion of predictions in the models’ AD and TS are reported in Additional file 2. A complete overview of the performance of all the models with additional statistical parameters is available in Additional file 3, whereas correlations between the experimental values of chemicals in the validation datasets and the predictions by each model are shown in Additional file 4 (for the purpose of showing correlation, validation datasets related to the same property were joined).

The compiled statistical parameters affirm the validity of the majority of the evaluated tools as reliable alternatives for assessing the PC and TK profiles of chemicals. Specifically, the statistical parameters (R² for regression models and BA for classification models) resulting from each validation (i.e., predictions made with a single model for a single dataset corresponding to a specific property) were averaged to evaluate the general performance of single tools and properties (Fig. 4).

When analyzing the overall dataset performance, more than 60% of the validations conducted on the regression models yielded R² values exceeding 0.60, and more than 40% exhibited values surpassing 0.70.

A general trend emerged, highlighting the superior performance of the models that predict PC properties compared to models that predict TK properties. Specifically, validations involving PC predictive models yielded an average R² (R²avg) of 0.717, with 63% of validations characterized by R² ≥ 0.70. In contrast, TK predictive models achieved an R²avg of 0.639, and only 40% of these validations had R² ≥ 0.70.

Further inspection of specific endpoints revealed that models predicting logH (R²avg = 0.935), BP (R²avg = 0.930), logVP (R²avg = 0.915) and logWS (R²avg = 0.797) demonstrated the best average predictive performance. Conversely, models predicting some TK properties such as Caco-2 (R²avg = 0.567) and FUB (R²avg = 0.652) displayed comparatively lower validation performance.

An exception among PC properties was observed in the evaluation of pKa predictors, wherein the machine learning models implemented in OPERA severely underperformed with respect to the ChemAxon predictor, also lowering the average validation performance for this endpoint. Notably, challenges in determining the experimental pKa, particularly for basic compounds, have been reported previously as a consequence of the unfavorable PC properties of these chemicals (e.g., low aqueous solubility), which may compromise the correct execution of the test [27]. The same behavior was observed here, with predictions of basic pKa (R²avg = 0.426) being less accurate than predictions of acidic pKa (R²avg = 0.446).

Among the specific tools tested, TEST, ADMETLab and OCHEM exhibited the highest average performance, characterized by R²avg = 0.844, 0.809 and 0.803 respectively and over 80% of the validations had R² ≥ 0.70. ProtoPRED (R²avg = 0.776), EPISUITE (R²avg = 0.761) and OPERA (R²avg = 0.682) models followed the best tools. However, OPERA faces challenges in pKa predictions, predominantly resulting in validations with R² < 0.50. Nonetheless, the OPERA models consistently demonstrated high predictivity and were identified as the top performers for several properties (BP, logH, logVP, logWS, MP and FUB). Nevertheless, the OPERA software contains a version of the CompTox database (https://comptox.epa.gov/dashboard/), returning experimental data if a predicted chemical is included in the above-mentioned database, potentially altering the performance statistics. Similarly, ProtoPRED was one of the the top-performing solution for logD, MP and logKp. Along with OCHEM ALOGP 2.1 and ADMETLab, VEGA models (in particular ALOGP) were the top choices for logP predictions. ADMETLab models were among the best choice for logD, logP and Caco-2 (Table 4).

Table 4 List of suggested top predictive models for each property

Full size table

Restricting validations to predictions inside the models' AD generally improved predictive performance. The R²avg of the models for predicting PC properties increased from 0.717 to 0.740, while the R²avg of the models for predicting TK properties increased from 0.639 to 0.704. Single property predictions within the AD also improved, except for the pKa, due to the exclusion of the ChemAxon predictor, which does not include AD assessment.

As expected, the performance observed for chemicals outside the models' TS was lower than that observed for the entire dataset, with an R²avg of 0.621 for the PC models and an R²avg of 0.605 for the TK models. The consideration of model performance on unseen chemicals is crucial for evaluating model applicability and reliability in real-world scenarios beyond the scope of their TS.

When validated across the entire datasets, the classification models achieved an average balanced accuracy (BAavg) of 0.780, with 78% of the validations exceeding BA = 0.70. Notably, the models performed generally well in predicting the TK properties, namely P-gp inhibition (BAavg = 0.905), P-gp substrate (BAavg = 0.778), HIA (BAavg = 0.777), bioavailability (F30%) (BAavg = 0.742) and BBB (BAavg = 0.741) with vNNADMET (BAavg = 0.853), ProtoPRED (BAavg = 0.837) and ADMETLab (BAavg = 0.813) being the best predictors. (Table 4).

Analogously to regression models, classifiers exhibit an improvement in performance when restricted to predictions within the models' AD, with BAavg increasing from 0.780 to 0.841, and a slight reduction in performance when considering only chemicals outside their TS, with BAavg decreasing from 0.780 to 0.731.

Discussion

Practical utility of the model validation

This manuscript presents a comprehensive benchmarking of various computational software programs designed for predicting chemical properties crucial for TK and PC assessments of compounds. The aim is to provide insights into selecting optimal tools for high-throughput property assessment, through a robust statistical validation conducted on benchmark datasets sourced from the literature.

In recent years, several authors have underscored the significance of using in silico resources for PC [63, 64] and ADMET prediction [65,66,67,68,69,70]. While not aiming for exhaustiveness, this manuscript focuses on discussing the practical utility of well-known in silico resources, particularly emphasizing freely available tools capable of simultaneous high-throughput predictions.

The practical utility of the software is assessed through statistical validation, with a specific emphasis on external predictivity that is crucial for real-life applicability and regulatory acceptance, aligning with the principles outlined in the OECD QSAR Assessment Framework [71]. In particular, the framework emphasizes the importance of evaluating performance within the models' AD to ensure correct model application. AD is a theoretical region in chemical space where predictions have a defined reliability and are not the result of training data extrapolation [72]. In this regard, when suggesting optimal models for a property, preference was given to those with strong predictive performance within their AD, rather than models that performed well overall but had lower AD specific performance or did not evaluate the AD.

The ease of applicability is another criterion used to select the tools presented in the manuscript. Indeed, software programs implementing models are user-friendly and require minimal input from users, needing only the SMILES notation of the compounds to generate predictions. While some tools include features to normalize the SMILES, it is recommended that users properly prepare SMILES before submitting them for predictions. This preparation can be done using available tools to remove counterions, neutralize and normalize functional groups, and possibly calculate canonical SMILES [73, 74]. Importantly, no descriptors need to be pre-calculated by the end user, as all necessary operations are automatically handled by the software. This minimal user intervention enhances the reproducibility of results and reduces the overall complexity of the calculation process, making these tools highly accessible.

The real-life validity of the collected validation results is an aspect of utmost importance. Today, identifying reliable methods for high-throughput assessment of PC and TK properties holds fundamental utility in diverse fields such as safe chemical design [75, 76], chemical toxicity prediction [77], environmental fate and ecotoxicity assessment [78] and understanding chemical distribution in the human body.

A practical application of this last point is the parametrization of physiologically based kinetic (PBK) models. These mathematical models predict the effects of chemicals on the body by incorporating factors such as blood flow, plasma protein binding, and tissue composition [79]. However, the low-throughput nature and consequent low applicability of this strategy has led to an increasing use of computational predictions for key properties. For instance, Geci et al. [80] leveraged the results reported here to develop a high-throughput screening PBK model (HT-PBK). The authors evaluated the performances of HT-PBK models parameterized with predictions of the top-predicting tools considered here, demonstrating similar predictivity compared to models parameterized with in vivo and in vitro benchmark data.

Chemical space analysis

Another critical aspect to consider when assessing the practical utility of the performed validations is the chemical coverage of the datasets used to evaluate the predictive capability of each tool, as conclusions drawn should be limited to the categories of chemicals covered by the validation datasets. Although validation data were not selected with the aim to cover specific chemical categories, having datasets of different sizes gives more nuanced view of model capabilities. In particular, benchmarking on small datasets allows for a detailed evaluation of targeted chemical domains, revealing how well models handle specific areas of the chemical spaces, but may risk limited generalizability. In contrast, the use of large general databases (such as the PHYSPROP datasets) offered a broader assessment, highlighting how models perform across diverse chemical spaces and property ranges.

Following this reasoning, chemical space analyses were performed to understand how each dataset covers different chemical classes. A reference chemical space was constructed by integrating different datasets representing three distinct chemical macro-categories: drugs, natural products, and industrial chemicals. Subsequently, the validation datasets were projected into this space to discern the chemical categories covered during the validation. Chemicals belonging to the three categories (drugs, industrial chemicals and natural products) used to construct the reference chemical space are included in Additional file 5.

As depicted in Fig. 5, the datasets selected for the majority of the properties cover a substantial portion of the reference chemical space, particularly in the areas associated with drugs (blue density curves) and industrial chemicals (green density curves). The region covering industrial chemicals is well populated by PC data, such as those for BP, MP, logWS, logVP, and pKa. These properties also have the largest validation datasets among those collected here, making the conclusions of this analysis reasonably reliable. This observation is not unexpected, considering that the PC characterization of industrial chemicals is a prerequisite for their production and commercialization. Conversely, the abundance in the drug and especially in the natural products space (orange density curves) was lower for those properties. An exception is the logP, recognized as a parameter of utmost importance and one of the first measured properties when assessing drug bioavailability.

The size of the datasets used to validate TK predictions is generally smaller than those used to validate PC predictions. This reduction in the amount of data is a consequence of the greater complexity of the tests, and the relevance of these properties is limited mainly to chemicals intended for human consumption, such as drugs. The chemicals in the datasets used for validating TK models are primarily distributed in the drug chemical space, with some properties (e.g., PPB, BBB, Pgp.inh and Pgp.sub) also extending into the area of natural products.

Comparison between PK and TK properties

As highlighted in the Results section, regression models predicting PC properties yield more accurate results than those predicting TK properties. In general, determining PC properties often involves standardized and well-established methods, whereas the complexity of TK properties arises from the need to understand how a substance interacts within a living organism. This process can vary significantly between different species, increasing the complexity of experimental protocols, the experimental and biological variability and the uncertainty of TK measurement. These considerations align with the concept that the inherent uncertainty associated with QSAR predictions is strictly dependent on the uncertainty in the experimental data; consequently, a model becomes less accurate as the uncertainty of the training data increases [81].

Furthermore, PC properties are exclusively related to the molecular structure of a compound, facilitating the establishment of relationships effectively captured by QSAR models. These properties are influenced by factors that are easier to represent with molecular descriptors. Conversely, predicting TK properties can be more challenging, as these depend on tridimensional biological interactions that are often not properly captured by 2D molecular descriptors alone.

Study limitations

A major limitation affecting the validation concerns the variation in performance of some models across different datasets. This variation in performance can be likely attributed to the different levels of experimental data curation performed by the authors of the original studies. Such limitations may lead to erroneous or arbitrarily assigned values, and may cause different datasets to present inconsistent values for the same property. Moreover, this issue is particularly relevant for endpoints inherently more prone to experimental variability due to potentially different experimental conditions and methodological limits in original measurements.

In the present work, we attempted to address these limitations by checking datasets for response outliers originating from annotation errors (“intra-outliers”) and shared chemicals showing inconsistent values across datasets (“inter-outliers”). The reduction in size for some of the datasets after outlier removal was particularly consistent for certain lipophilicity-related properties, such as FUB, due to the methodological limits discussed above.

Despite the data curation performed, the practical difficulty in manually verifying the conditions under which the property values were generated in each of the original source papers still made it difficult to achieve uniform model performance, thus exacerbating the challenge of model validation.

Another reason potentially explaining variability across datasets is the potential representativity of each dataset used. As highlighted in our analysis, datasets cover different areas of the chemical space both across the various properties and also across various datasets for the same properties, leading to biased performance metrics. For example, those datasets mainly including pharmaceutical products are likely to include more accurate experimental estimation with respect of those including industrial chemicals, due to the more demanding legislation imposing a careful assessment of properties of candidate drugs during their development and clinical evaluation.

Furthermore, different computational tools may have inherent algorithmic biases or limitations only partially reflected by the AD assessment. For instance, certain software may be optimized for specific types of chemicals, leading to performance variations.

Despite recognizing these limitations, the results achieved are corroborated by the fact that all models were validated on the same datasets. This simultaneous validation ensures that the comparisons are not compromised by the data issues, as all models encountered the same data challenges. Moreover, the highlighted limitations in the validation data likely cause an underestimation of the overall models' performance. Considering the potential data issues and the relatively good performance observed in this exercise, the overall quality of the models is encouraging.

Conclusions

In the present manuscript, a large benchmarking study was described aimed at providing insights into the performance and practical utility of twelve computational tools, implementing several QSAR models for predicting the TK and PC properties of chemicals. External predictivity, crucial for real-world applicability and regulatory acceptance, was emphasized during the whole performance assessment.

The results of the evaluation suggest generally greater predictivity of models predicting PC properties when compared to those predicting TK properties, reflecting the inherent challenges of QSAR in predicting more complex properties, which is also due to the inherent uncertainty of the experimental data.

In addition, restricting validations to the models' AD generally improves predictive performance, reinforcing the importance of correct model application within specified chemical spaces. In this regard, the construction of a reference chemical space contributes to a deeper understanding of the chemical categories covered during validation. Notably, the majority of the validation datasets cover a substantial portion of the space representative of relevant chemical classes (drugs, industrial chemicals and natural products), providing confidence in the reliability of the achieved results.

The limitations of the study were disclosed, suggesting the importance of the curation of the experimental data and the consistency of the experimental conditions to enhance the reliability and the predictive power of the computational models.

In this regard, the use of a consensus of different models and a weight-of-evidence approach may help in covering the limitations inherent to the single models, because each model potentially identifies and corrects errors made by the others when combined.

Overall, the findings presented herein offer a valuable resource for researchers, regulatory authorities, and industry professionals seeking robust computational tools for high-throughput PC and TK assessment for chemical design, toxicity and environmental fate prediction, ultimately advancing the pursuit of safer and more sustainable chemical practices.

Availability of data and materials

The code used to perform the statistical analysis and to generate the figures included in the manuscript is available at https://github.com/DGadaleta88/benchmark_comp_tools_PC_TK. The dataset(s) supporting the conclusions of this article is(are) included within the article (and its additional file(s)).

Abbreviations

AD:: Applicability domain
ADMET:: Absorption, distribution, metabolism, excretion, and toxicity
AI:: Artificial intelligence
BA:: Balanced accuracy
BBB:: Blood–brain barrier permeability
Caco-2:: Caco-2 permeability
F30%:: Bioavailability at 30%
FCFP:: Functional connectivity circular fingerprints
FUB:: Fraction unbound to plasma proteins
HIA:: Human intestinal absorption
HTS-PBK:: High-throughput screening physiologically based kinetic model
logD:: Octanol/water distribution ratio at pH = 7.4
logH:: Henry’s law constant
logKp:: Skin permeation
logP:: Octanol/water partition coefficient
logVP:: Vapor pressure
logWS:: Water solubility
MAE:: Mean absolute error
MCC:: Matthew’s correlation coefficient
MP:: Melting point
NAM:: New approach methodology
PB:: Boiling point
PBK:: Physiologically based kinetic
PC:: Physicochemical
PCA:: Principal component analysis
Pgp.inh:: P-gp inhibitor
Pgp.sub:: P-gp substrate
pKa-a:: Acidic dissociation constant
pKa-b:: Basic dissociation constant
QSAR:: Quantitative structure–activity relationship
R² :: Coefficient of determination
R²avg:: Average coefficient of determination
RMSE:: Root-mean-square error
SEN:: Sensitivity
SPE:: Specificity
TK:: Toxicokinetic
TS:: Training set

References

Kola I, Landis J (2004) Can the pharmaceutical industry reduce attrition rates? Nat Rev Drug Discov 3:711–715
Article CAS PubMed Google Scholar
Kubinyi H (2003) Drug research: myths hype and reality. Nat Rev Drug Discov 2:665–668
Article CAS PubMed Google Scholar
Song CM, Lim SJ, Tong JC (2009) Recent advances in computer-aided drug design. Brief Bioinform. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/bib/bbp023
Article PubMed Google Scholar
Waring MJ, Arrowsmith J, Leach AR, Leeson PD, Mandrell S, Owen RM, Pairaudeau G, Pennie WD, Pickett SD, Wang J, Wallace O, Weir A (2015) An analysis of the attrition of drug candidates from four major pharmaceutical companies. Nat Rev Drug Discov 14:475–486
Article CAS PubMed Google Scholar
Ali H, Khan E, Ilahi I (2019) Environmental chemistry and ecotoxicology of hazardous heavy metals: environmental persistence toxicity and bioaccumulation. J Chem. https://doiorg.publicaciones.saludcastillayleon.es/10.1155/2019/6730305
Article Google Scholar
Zhu M, Chen J, Peijnenburg WJ, Xie H, Wang Z, Zhang S (2023) Controlling factors and toxicokinetic modeling of antibiotics bioaccumulation in aquatic organisms: a review. Crit Rev Environ Sci Technol 53(15):1431–1451
Article CAS Google Scholar
Cheng F, Li W, Liu G, Tang Y (2013) In silico ADMET prediction: recent advances current challenges and future trends. Curr Top Med Chem 13:1273–1289
Article CAS PubMed Google Scholar
Wu F, Zhou Y, Li L, Shen X, Chen G, Wang X, Liang X, Tan M, Huang Z (2020) Computational approaches in preclinical studies on drug discovery and development. Front Chem 8:726
Article CAS PubMed PubMed Central Google Scholar
van de Waterbeemd H, Gifford E (2003) ADMET in silico modelling: towards prediction paradise? Nat Rev Drug Discov 2:192–204
Article PubMed Google Scholar
Davis AM, Riley RJ (2004) Predictive ADMET studies the challenges and the opportunities. COCHBI 8:378–386
CAS Google Scholar
Hou T, Wang J (2008) Structure-ADME relationship: still a long way to go? Expert Opin Drug Metab Toxicol 4:759–770
Article CAS PubMed Google Scholar
Vinken M, Benfenati E, Busquet F, Castell J, Clevert D et al (2021) Safer chemicals using less animals: kick-off of the European ONTOX project. Toxicol 458:152846
Article CAS Google Scholar
Katritzky AR, Lobanov VS, Karelson M (1998) Normal boiling points for organic compounds: correlation and prediction by a quantitative structure−property relationship. J Chem Inf Comp Sci 38(1):28–41
Article CAS Google Scholar
Hall LH, Story CT (1996) Boiling point and critical temperature of a heterogeneous data set: QSAR with atom type electrotopological state indices using artificial neural networks. J Chem Inf Comput 36:1004–1014
Article CAS Google Scholar
Liu Y, Yu X, Chen J (2020) Quantitative structure–property relationship of distribution coefficients of organic compounds. SAR QSAR Env Res 31:585–596
Article CAS Google Scholar
Wu Z, Ramsundar B, Feinberg E et al (2018) MoleculeNet: a benchmark for molecular machine learning. Chem Sci 9(2):513–530
Article CAS PubMed Google Scholar
Modarresi H, Modarress H, Dearden J (2007) QSPR model of Henry’s law constant for a diverse set of organic chemicals based on genetic algorithm-radial basis function network approach. Chemosphere 66:2067–2076
Article CAS PubMed Google Scholar
Yao X, Liu M, Zhang X, Hu Z, Fan B (2002) Radial basis function network-based quantitative structure–property relationship for the prediction of Henry’s law constant. Anal Chim Acta 462:101–117
Article CAS Google Scholar
Hughes LD, Palmer DS, Nigsch F, Mitchell JB (2008) Why are some properties more difficult to predict than others? A study of QSPR models of solubility melting point and log P. J Chem Inf Model 48:220–232
Article CAS PubMed Google Scholar
Martel S et al (2013) Large chemically diverse dataset of logP measurements for benchmarking studies. Eur J Pharm Sci 48:21–29
Article CAS PubMed Google Scholar
Katritzky AR, Slavov SH, Dobchev DA, Karelson M (2007) Rapid QSPR model development technique for prediction of vapor pressure of organic compounds. Comput Aided Chem Eng 31:1123–1130
Article CAS Google Scholar
Katritzky AR, Maran U, Karelson M, Lobanov VS (1997) Prediction of melting points for the substituted benzenes: a QSPR approach. J Chem Inf Comput Sci 37(5):913–919
Article CAS Google Scholar
Habibi-Yangjeh A, Pourbasheer E, Danandeh-Jenagharad M (2008) Prediction of melting point for drug-like compounds using principal component-genetic algorithm-artificial neural network. BKCS 29:833–841
CAS Google Scholar
Avdeef A, Box KJ, Comer JE et al (1999) PH-metric log P 11, pKa determination of water-insoluble drugs in organic solvent–water mixtures. JPBA 20:631–641
CAS Google Scholar
Sander T, Freyss J, von Korff M, Rufener C (2015) DataWarrior: an open-source program for chemistry aware data visualization and analysis. J Cheminform 55:460–473
CAS Google Scholar
Liao C, Nicklaus M (2009) Comparison of nine programs predicting pKa values of pharmaceutical substances. J Chem Inf Model 49:2801–2812
Article CAS PubMed PubMed Central Google Scholar
Settimo L, Bellman K, Knegtel RMA (2014) Comparison of the accuracy of experimental and predicted pKa values of basic and acidic compounds. Pharm Res 31:1082–1095
Article CAS PubMed Google Scholar
Wang NN, Dong J, Deng YH, Zhu MF, Wen M, Yao ZJ, Lu AP, Wang JB, Cao DS (2016) ADME properties evaluation in drug discovery: prediction of Caco-2 cell permeability using a combination of NSGA-II and boosting. J Chem Inf Model 56:763–773
Article CAS PubMed Google Scholar
Pham-The H, González-Álvarez I, Bermejo M, Garrigues T, Le-Thi-Thu H et al (2013) The use of rule-based and QSPR approaches in ADME profiling: a case study on caco-2 permeability. Mol Inform 32:459–479
Article CAS PubMed Google Scholar
Tonnelier A, Coecke S, Zaldívar J (2012) Screening of chemicals for human bioaccumulative potential with a physiologically based toxicokinetic model. Arch Toxicol 86:393–403
Article CAS PubMed Google Scholar
Yamazaki K, Kanaoka M (2004) Computational prediction of the plasma protein-binding percent of diverse pharmaceutical compounds. J Pharm Sci 93:1480–1494
Article CAS PubMed Google Scholar
Lombardo F, Obach R et al (2002) Prediction of volume of distribution values in humans for neutral and basic drugs using physicochemical measurements and plasma protein binding data. J Med Chem 45:2867–2876
Article CAS PubMed Google Scholar
Riley RJ, McGinnity DF, Austin RP (2005) A unified model for predicting human hepatic metabolic clearance from in vitro intrinsic clearance data in hepatocytes and microsomes. Drug Metab Dispos 33:1304–1311
Article CAS PubMed Google Scholar
Votano JR, Parham M, Hall LM, Hall LH, Kier LB, Oloff S, Tropsha A (2006) QSAR modeling of human serum protein binding with several modeling techniques utilizing structure− information representation. J Med Chem 49:7169–7181
Article CAS PubMed Google Scholar
Zhu XW, Sedykh A, Zhu H, Liu SS, Tropsha A (2013) The use of pseudo-equilibrium constant affords improved QSAR models of human plasma protein binding. Pharm Res 30:1790–1798
Article CAS PubMed PubMed Central Google Scholar
Khajeh A, Modarress H (2014) Linear and nonlinear quantitative structure-property relationship modelling of skin permeability. SAR QSAR Env Res 25:35–50
Article CAS Google Scholar
Potts RO, Guy RH (1992) Predicting skin permeability. Pharm Res 9:663–669
Article CAS PubMed Google Scholar
ten Berge W (2009) A simple dermal absorption model: derivation and application. Chemosphere 75:440–1445
Google Scholar
Wang Z, Yang H, Wu Z, Wang T, Li W, Tang Y (2018) In silico prediction of blood-brain barrier permeability of compounds by machine learning and resampling methods. Chem Med Chem 13:2189–2201
Article CAS PubMed Google Scholar
Kim MT, Sedykh A, Chakravarti SK, Saiakhov RD, Zhu H (2014) Critical evaluation of human oral bioavailability for pharmaceutical drugs by using various cheminformatics approaches. Pharm Res 31:1002–1014
Article CAS PubMed Google Scholar
Fagerholm U, Hellberg S, Spjuth O (2021) Advances in predictions of oral bioavailability of candidate drugs in man with new machine learning methodology. Mol 26:2572
Article CAS Google Scholar
Wang NN, Huang C, Dong J, Yao ZJ, Zhu MF, Deng ZK, Lv B, Lu AP, Chen AF, Cao DS (2017) Predicting human intestinal absorption with modified random forest approach: A comprehensive evaluation of molecular representation unbalanced data and applicability domain issues. RSC Adv 7:19007–19018
Article CAS Google Scholar
Wang Z, Chen Y, Liang H, Bender A, Glen R et al (2011) P-glycoprotein substrate models using support vector machines based on a comprehensive data set. J Chem Inf Model 51:1447–1456
Article CAS PubMed Google Scholar
Li D, Chen L, Li Y, Tian S, Sun H, Hou T (2014) ADMET evaluation in drug discovery. 13. Development of in silico prediction models for P-glycoprotein substrates. Mol pharm 11(3):716–726
Article CAS PubMed Google Scholar
Broccatelli F, Carosati E, Neri A, Frosini M, Goracci L, Oprea TI, Cruciani G (2011) A novel approach for predicting P-glycoprotein (ABCB1) inhibition using molecular interaction fields. J Med Chem 54:1740–1751
Article CAS PubMed PubMed Central Google Scholar
Wishart DS, Knox C, Guo A et al (2006) DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 34:668–667
Article Google Scholar
van Santen JA, Poynton EF, Iskakova D, McMann E, Alsup TA, Clark TN, Fergusson CH, Fewer DP, Hughes AH, McCadden CA, Parra J, Soldatou S, Rudolf JD, Janssen EML, Duncan KR, Linington RG (2022) The natural products atlas 2.0: a database of microbially-derived natural products. Nucleic Acids Res 50:1317–1323
Article Google Scholar
Mansouri K, Grulke C et al (2018) OPERA models for predicting physicochemical properties and environmental fate endpoints. J Cheminform 10:11–19
Article Google Scholar
Benfenati E, Manganaro A, Gini G (2013) VEGA-QSAR: AI inside a platform for predictive toxicology. PAI@ AI* IA 1107:21–28.
Toma C, Gadaleta D, Roncaglioni A, Toropov A, Toropova A, Marzo M, Benfenati E (2019) QSAR development for plasma protein binding: influence of the ionization state. Pharm Res 36:1–9
Article CAS Google Scholar
US EPA (2012) Estimation Programs Interface Suite™ for Microsoft® Windows v 4.11, United States Environmental Protection Agency Washington. 2012
US EPA (2020) User’s Guide for TEST (version 5.1) (Toxicity Estimation Software Tool): a program to estimate toxicity from molecular structure. https://www.epa.gov/sites/default/files/2016-05/documents/600r16058.pdf.
Sushko I, Novotarskyi S, Körner R, Pandey A et al (2011) Online chemical modeling environment (OCHEM): Web platform for data storage model development and publishing of chemical information. J Comput Aided Mol Des 25:533–554
Article CAS PubMed PubMed Central Google Scholar
Tetko IV, Tanchuk VY, Kasheva TN, Villa A (2001) Estimation of aqueous solubility of chemical compounds using E-state indices. J Chem Inf Comput 41:1488–1493
Article CAS Google Scholar
Xiong G, Wu Z, Yi J, Fu L, Yang Z, Hsieh C, Cao D (2021) ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Res 49:5–14
Article Google Scholar
Lee A et al (2009) Predicting pKa. J Chem Inf Model 49:2013–2033
Article CAS PubMed Google Scholar
Pires DE, Blundell TL, Ascher DB (2015) pkCSM: predicting small-molecule pharmacokinetic and toxicity properties using graph-based signatures. J Med Chem 58:4066–4072
Article CAS PubMed PubMed Central Google Scholar
Daina A, Michielin O, Zoete V (2017) SwissADME: a free web tool to evaluate pharmacokinetics drug-likeness and medicinal chemistry friendliness of small molecules. Sci Rep 7:42717
Article PubMed PubMed Central Google Scholar
Daina A, Zoete V (2016) A boiled-egg to predict gastrointestinal absorption and brain penetration of small molecules. Chem Med Chem 11:1117–1121
Article CAS PubMed Google Scholar
Göller AH, Kuhnke L, Montanari F, Bonin A, Schneckener S, Ter Laak A, Wichard J, Lobell M, Hillisch A (2020) Bayer’s in silico ADMET platform: a journey of machine learning over the past two decades. Drug Discov Today 25:1702–1709
Article PubMed Google Scholar
Ballabio D, Grisoni F, Todeschini R (2018) Multivariate comparison of classification performance measures. Chemom Intell Lab Syst 174:33–44
Article CAS Google Scholar
Berthold MR, Cebron N, Dill F, Gabriel TR, Kötter T, Meinl T, Wiswedel B (2008) KNIME: The Konstanz Information Miner, In: C, Preisach H, Burkhardt L, Schmidt-Thieme R, Decker (Eds,) Data Analysis Machine Learning and Applications: Proceedings of the 31st Annual Conference of the Gesellschaft für Klassifikation e,V (pp 319–326) Springer Berlin Heidelberg.
Dearden JC (2012) Prediction of physicochemical properties, In: Computational toxicology: Volume I (pp, 93–138).
Dearden JC, Worth A (2007) In silico prediction of physicochemical properties. JRC Sci Tech Rep EUR 23051:1–68
Google Scholar
Ferreira LL, Andricopulo AD (2019) ADMET modeling approaches in drug discovery. Drug Discov Today 24:1157–1165
Article CAS PubMed Google Scholar
Kar S, Leszczynski J (2020) Open access in silico tools to predict the ADMET profiling of drug candidates. Expert Opin Drug Discov 15:1473–1487
Article CAS PubMed Google Scholar
Lombardo F, Gifford E, Shalaeva MY (2003) In silico ADME prediction: data models facts and myths. Mini-Rev Med Chem 3:861–875
Article CAS PubMed Google Scholar
Madden JC, Pawar G, Cronin MT, Webb S, Tan YM, Paini A (2019) In silico resources to assist in the development and evaluation of physiologically-based kinetic models. Comput Toxicol 11:33–49
Article Google Scholar
Maltarollo VG, Gertrudes JC, Oliveira PR, Honorio KM (2015) Applying machine learning techniques for ADME-Tox prediction: a review. Expert Opin Drug Metab Toxicol 11:259–271
Article CAS PubMed Google Scholar
Mostrag-Szlichtyng A, Worth A (2010) Review of QSAR models and software tools for predicting biokinetic properties. Institute for Health and Consumer Protection European Union JRC Scientific and Technical Reports 1.
OECD (2007) OECD principles for the Validation for Regulatory Purposes of (Q)SAR Models, https://www.oecd.org/chemicalsafety/risk-assessment/37849783.pdf.
Gadaleta D, Mangiatordi GF, Catto M, Carotti A, Nicolotti O (2016) Applicability domain for QSAR models: where theory meets reality. IJQSPR 1:45–63
Google Scholar
Gadaleta D, Lombardo A, Toma C, Benfenati E (2018) A new semi-automated workflow for chemical data retrieval and quality checking for modeling applications. J Cheminform 10:1–13
Article Google Scholar
Mansouri K, Grulke CM, Richard AM, Judson RS, Williams AJ (2016) An automated curation procedure for addressing chemical errors and inconsistencies in public datasets used in QSAR modelling. SAR QSAR Env Res 27:911–937
Article CAS Google Scholar
Meanwell N (2011) Improving drug candidates by design: a focus on physicochemical properties as a means of improving compound disposition and safety. Chem Res Toxicol 24:1420–1456
Article CAS PubMed Google Scholar
Paul Gleeson M, Hersey A, Hannongbua S (2011) In-silico ADME models: a general assessment of their utility in drug discovery applications. Curr Top Med Chem 11:358–381
Article PubMed Google Scholar
Yukawa T, Naven R (2020) Utility of physicochemical properties for the prediction of toxicological outcomes: Takeda perspective. ACS Med Chem Lett 11:203–209
Article CAS PubMed PubMed Central Google Scholar
Könnecker G, Regelmann J, Belanger S, Gamon K, Sedlak R (2011) Environmental properties and aquatic hazard assessment of anionic surfactants: physico-chemical environmental fate and ecotoxicity properties. Ecotoxicol Environ Saf 74:1445–1460
Article PubMed Google Scholar
Theil F-P, Guentert T et al (2003) Utility of physiologically based pharmacokinetic models to drug development and rational drug discovery candidate selection. Toxicol Lett 138:29–49
Article CAS PubMed Google Scholar
Geci R, Gadaleta D, Garcia de Lomana M, Ortega-Vallbona R, Colombo E, Serrano-Candelas E, Paini A, Kuepfer L, Schaller S (2024) Systematic evaluation of high-throughput PBK modelling strategies for the prediction of intravenous and oral pharmacokinetics in humans. Arch Toxicol 98:1–18
Article Google Scholar
Lombardo A, Roncaglioni A, Boriani E, Milan C, Benfenati E (2010) Assessment and validation of the CAESAR predictive model for bioconcentration factor (BCF) in fish. Chem Cent J 4:1–11
Article Google Scholar

Download references

Acknowledgements

The authors acknowledge Floriane Montanari for her assistance and for her useful feedback in the first stage of the project.

Funding

This work was performed in the context of the ONTOX project (https://ontoxproject.eu/) which has received funding from the European Union’s Horizon 2020 Research and Innovation programme under grant agreement No 963845. ONTOX is part of the ASPIS project cluster (https://aspiscluster.eu/).

Author information

Pablo Aparicio-Sánchez
Present address: Spanish National Cancer Research Center (CNIO), Experimental Therapeutics Programme, Madrid, Spain
Domenico Gadaleta and Eva Serrano-Candelas equally contributing first authors.

Authors and Affiliations

Laboratory of Environmental Chemistry and Toxicology, Department of Environmental Health Sciences, Istituto di Ricerche Farmacologiche Mario Negri IRCCS, Milan, Italy
Domenico Gadaleta, Erika Colombo, Giada Biava, Alessandra Roncaglioni & Emilio Benfenati
ProtoQSAR SL, CEEI (Centro Europeo de Empresas Innovadoras), 46980, Paterna, Valencia, Spain
Eva Serrano-Candelas, Rita Ortega-Vallbona, Pablo Aparicio-Sánchez & Rafael Gozalbes
Bayer AG, Machine Learning Research, Research & Development, Pharmaceuticals, Leverkusen, Germany
Marina Garcia de Lomana
Moldrug AI Systems SL, c/Olimpia Arozena Torres 45, 46018, Valencia, Spain
Rafael Gozalbes

Authors

Domenico Gadaleta
View author publications
You can also search for this author inPubMed Google Scholar
Eva Serrano-Candelas
View author publications
You can also search for this author inPubMed Google Scholar
Rita Ortega-Vallbona
View author publications
You can also search for this author inPubMed Google Scholar
Erika Colombo
View author publications
You can also search for this author inPubMed Google Scholar
Marina Garcia de Lomana
View author publications
You can also search for this author inPubMed Google Scholar
Giada Biava
View author publications
You can also search for this author inPubMed Google Scholar
Pablo Aparicio-Sánchez
View author publications
You can also search for this author inPubMed Google Scholar
Alessandra Roncaglioni
View author publications
You can also search for this author inPubMed Google Scholar
Rafael Gozalbes
View author publications
You can also search for this author inPubMed Google Scholar
Emilio Benfenati
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

DG: Conceptualization, Methodology, Data Analysis, Writing – Original Draft, Writing – Review and Editing. ESC: Methodology, Data Collection, Data Analysis, Visualization, Writing – Review and Editing. ROV, EC, MGDL: Data Collection, writing – Review and Editing. GB, PAS: Data Collection. AR: Supervision, Writing – Review & Editing. RG, EB: Supervision, Writing – Review & Editing, Funding Acquisition.

Corresponding author

Correspondence to Domenico Gadaleta.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

MGDL is an employee of Bayer, while RG, ESC, and ROV are employees of ProtoQSAR. Bayer and ProtoQSAR are owner of software evaluated in this work. The affiliations of the authors with Bayer and ProtoQSAR are disclosed for transparency and potential conflict of interest considerations. However, the competing interests declared do not affect the impartiality, integrity, or validity of the research findings presented in this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

13321_2024_931_MOESM1_ESM.pdf

Additional file 1. Visualization of the Z-score analysis and of the comparison of records shared in multiple datasets to identify ‘intra-outliers’ and ‘inter-outliers’.

13321_2024_931_MOESM2_ESM.xlsx

Additional file 2. Curated SMILES of chemicals included in each validation dataset, predictions returned by validated models with information on the inclusion of predictions in the models’ AD and TS.

Additional file 3. Overview of the statistical parameters resulting from the validation of all the analyzed models.

13321_2024_931_MOESM4_ESM.pdf

Additional file 4. Correlation between the experimental values of chemicals in the validation datasets and the predictions by each model.

13321_2024_931_MOESM5_ESM.xlsx

Additional file 5. Lists of chemicals included in the three chemical categories (drugs, industrial chemicals and natural products) used to construct the reference chemical space.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gadaleta, D., Serrano-Candelas, E., Ortega-Vallbona, R. et al. Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals. J Cheminform 16, 145 (2024). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-024-00931-z

Download citation

Received: 27 May 2024
Accepted: 11 November 2024
Published: 26 December 2024
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-024-00931-z

Comprehensive benchmarking of computational tools for predicting toxicokinetic and physicochemical properties of chemicals

Abstract

Scientific contribution

Introduction

Methods

Dataset selection

Data curation

Chemical space analysis

Tool selection

Validation of the different tools

Results

Discussion

Practical utility of the model validation

Chemical space analysis

Comparison between PK and TK properties

Study limitations

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Competing interests

Additional information

Publisher's Note

Supplementary Information

13321_2024_931_MOESM1_ESM.pdf

13321_2024_931_MOESM2_ESM.xlsx

Additional file 3. Overview of the statistical parameters resulting from the validation of all the analyzed models.

13321_2024_931_MOESM4_ESM.pdf

13321_2024_931_MOESM5_ESM.xlsx

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us