From: A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence
BERT MLM
2-encoder
Number of encoder layers
8, 10
Number of heads
8, 16
Dimension of molecular embedding
128, 256
Mask rate
0.1
0.5
Learning rate
0.0003
dropout rate