| Hyperparameter | Values |
---|---|---|
BNN | Activation | [ReLU] |
Batch normalization | [True] | |
Skip connection | [True] | |
Input layer | [768, 1024] | |
hidden layer dim | [128] | |
Number of hidden layers | [1] | |
Dropout probability | [0.3] | |
Training | Optimizer | [Adam] |
Learning rate | [\(10^{-3}\)] | |
Weight decay | [1e-2] | |
Scheduler | [CosineAnnealingLR] | |
T-max (LR cycle) | [10] | |
Batch size | [16] | |
Epochs | [110] | |
num. Forward pass | [20] |