Skip to main content

Table 9 Feature summary for each amino acid residue in proteins, each character of SMILES sequence of ligands, and each atom pair of the binding pocket

From: Distance plus attention for binding affinity prediction

 

Features

Size

Values

Feature description

Protein

one-hot encoding

21

1 or 0

1 corresponds to the amino acid index, otherwise 0

HHM

30

real values

various parameters from MSA

physicochemical properties

7

real values

steric parameter, hydrophobicity, volume, polarisability, isoelectric point, helix probability, sheet probability

Ligand

SMILES encoding

1

integer values

64 unique characters, corresponding to a specific numeric digit ranging from 1 to 64

Pocket

Distance bins

1

integer values

distances between protein and ligand atoms into 41 bins, with each distance corresponding to a numeric digit from 1 to 41

  1. Each protein could have maximum 500 residues and each ligand SMILES sequence could have maximum 150 charaters