Skip to main content

Table 1 Hyperparameter settings of pretraining

From: A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence

 

BERT MLM

2-encoder

Number of encoder layers

8, 10

8, 10

Number of heads

8, 16

8, 16

Dimension of molecular embedding

128, 256

128, 256

Mask rate

0.1

0.5

Learning rate

0.0003

0.0003

dropout rate

0.1

0.1