Positional embeddings and zero-shot learning using BERT for molecular-property prediction

Table 9 Summary on F1 score and RMSE comparisons among position encoding/PEs using BERT

Task	Data	Sequence	Relative_key_query	Sinusoidal	Relative_key	Absolute
Classification (F1 score)	Malaria	SMILES	0.7439	0.6811	0.7254	0.6685
	Malaria	DeepSMILES	0.6532	0.6866	0.6913	0.6346
	COVID	SMILES	0.7568	0.7805	0.8000	0.7733
	COVID	DeepSMILES	0.7568	0.7671	0.8180	0.7733
	COVID-19	SMILES	0.7718	0.7950	0.7819	0.8049
	COVID-19	DeepSMILES	0.7179	0.7417	0.7654	0.7407
	Cocrystals	SMILES	0.5538	0.6713	0.6466	0.5538
	Cocrystals	DeepSMILES	0.5755	0.7285	0.6711	0.5152
	BBBP\(^{c_w}\)	SMILES	0.8092	0.8967	0.8498	0.8542
	BBBP\(^{c_w}\)	DeepSMILES	0.8456	0.7626	0.8512	0.8477
	BBBP	SMILES	0.8555	0.8531	0.9061	0.8400
	BBBP	DeepSMILES	0.8483	0.8473	0.9196	0.8483
	ClinTox	SMILES	0.9617	0.9577	0.8800	0.9580
	ClinTox	DeepSMILES	0.9617	0.9617	0.9231	0.9617
	Tox21\(^{c_w}\)	SMILES	0.9493	0.9058	–	0.8811
	Tox21\(^{c_w}\)	DeepSMILES	0.9139	0.9647	–	0.8348
	Tox21	SMILES	0.9688	0.9680	0.9680	0.9672
	Tox21	DeepSMILES	0.9688	0.9680	0.9680	0.9672
Regression (RMSE)	ESOL	SMILES	0.6185	0.5883	0.7878	0.5983
	ESOL	DeepSMILES	0.6557	0.6256	0.8431	0.5584
	FreeSolv	SMILES	1.8858	2.0491	2.6242	2.4169
	FreeSolv	DeepSMILES	2.1103	2.1572	2.0209	1.9840
	Lipophilicity	SMILES	0.5704	0.5732	0.5716	0.6025
	Lipophilicity	DeepSMILES	0.6333	0.6707	0.6857	0.6747

Bold values denote the best-achieved performance for clarity and emphasis
\({c_w}\) class-weighted function, DeepSMILES zero-shot learning analysis of BERT, F1 score classification tasks, RMSE regression tasks

ISSN: 1758-2946