From: Positional embeddings and zero-shot learning using BERT for molecular-property prediction
Parameters | Pretraining | Fine-tuning | Position encoding/PEs |
---|---|---|---|
Learning rate | 1e−4 | 5e−6 |  |
Batch size | 16 | 16 | Â |
Warm-up ratio | 0.016 | 0.1 | Â |
Weight decay | 0.01 | 0.01 | Â |
Number of epochs | 5 | 10 | Â |
Optimizer | AdamW | AdamW | Â |
Warm up schedular | Linear | Linear | Â |
Number of parameters | 85,054,464 | 86,496,002 | Absolute |
 | 85,840,128 | 87,281,666 | Relative_key |
 | 85,840,128 | 87,281,666 | Relative_key_query |
 | 85,054,464 | 86,496,002 | Sinusoidal [52] |