A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence

Table 1 Hyperparameter settings of pretraining

	BERT MLM	2-encoder
Number of encoder layers	8, 10	8, 10
Number of heads	8, 16	8, 16
Dimension of molecular embedding	128, 256	128, 256
Mask rate	0.1	0.5
Learning rate	0.0003	0.0003
dropout rate	0.1	0.1

ISSN: 1758-2946