A systematic review of deep learning chemical language models in recent era

Table 2 Descriptive statistics and P-values are presented for the performance and training datasets of generative models that implemented TL

	Unbiased model	Target model	Samples	P-value
Training dataset size	1,128,920	2507	17	< 0.0001
Validity	98.05	95.5	10	0.1602
Uniqueness	97.9	90.2	11	0.0144
Novelty	91.6	96.0	8	0.8438

Median values are reported for both the Unbiased and target models, and P-values were calculated through the Mann–Whitney U test for paired samples. The number of articles meeting the required metric reporting criteria for analysis is indicated by the sample’s column

ISSN: 1758-2946