From: A beginner’s approach to deep learning applied to VS and MD techniques
DL-VS method/tool | Description |
---|---|
LBVS-type screening step | |
IVS2vec [34] | DFCNN that uses as input ligand compound vectors generated by Mol2vec (a ML method producing high-dimensional embeddings of molecular structures) and is capable of a binary protein classification: proteins with either a high or a low possibility of binding with a query ligand |
DEEPScreen [36] | Collection of 704 CNNs, each an individual predictor of favorable interactions between a query protein and small molecule ligands |
Freely accessible web server capable of training DL models for either classification tasks or regression tasks. For classification tasks, DFCNNs perform binding probability predictions on a provided chemical library. For regression tasks, RNNs generate a de novo compound library and then perform binding probability predictions against a query protein | |
DFCNN that uses as input ligand compound vectors generated by Mol2vec and that is capable of predicting protein–ligand binding probabilities. It uses only molecular and chemical information of the compound vectors, not considering spatial information | |
DFCNN that uses as input protein–ligand complex structures as generated by AutoDock Vina (thus considering spatial information) and that can estimate protein–ligand binding probabilities | |
Generative model for VS steps | |
GAN by Andrianov et al. [47] | GAN that consists of an AE encoder and DFCNN discriminator, that is able to generate molecular fingerprints of compounds similar to those from its training set, as to then identify comparable compounds from existing compound libraries for further use in a drug discovery workflow |
LSTM RNN by Arshia et al. [53] | LSTM RNN retrained through DTL from a network called LSTM_Chem capable of capturing the features of SMILES molecular representations. It was retrained through 10 generations of refinements to learn to generate SMILES of unique, original and valid compounds, each generation with better binding affinity to a query protein |
WAE by Das et al. [64] | A type of VAE with a GRU encoder and decoder, able to capture the features of short peptide sequences (max. 25 amino acids). Using this model’s latent space and four bidirectional LSTM classifier models, the architecture can generate diverse, valid AMPs with broad-spectrum potency and low toxicity, used for further in silico, in vitro and in vivo testing |
Binding affinity predictor | |
DeepBindRG [81] | ResNet that uses as input 2D binding interface-related matrices of protein–ligand complexes and predicts their binding affinity |
Pafnucy [79] | Model built of convolutional and dense layers, capable of using 4D input information (3D coordinates and an additional feature vector) to predict the binding affinity of protein–ligand complexes |
AEV-PLIG [104] | Attention-based GNN model that uses as input protein–ligand interaction graphs to capture the interplay of interactions determining binding affinity and predict binding affinities for the query complexes |
Pose predictor | |
EquiBind [106] | Combination of a graph matching network and GNN that uses as input protein–ligand complex graphs to perform one-shot predictions of the most optimal binding poses of query ligands in proteins (without binding affinity values) |
TANKBind [107] | Similar GNN approach to EquiBind, using an additional bias parameter set to better prevent steric clashes and unrealistic conformations during the one-shot binding pose predictions. It also includes an additional module that allows for binding affinity predictions |
DiffDock [110] | Diffusion generative model that starts with random conformations of a query ligand docked onto a protein and uses a reverse diffusion process to sample realistic protein–ligand binding poses and iteratively refine the system towards a most optimal final binding pose prediction |
AlphaFold 3 [112] | Attention-based architecture capable of predicting the 3D structure of proteins with unknown tertiary and quaternary structures based on their amino acid sequence, as well as predict interactions with other proteins, small molecule ligands, nucleic acids, and modified or non-canonical residues |
Generative model to replace VS steps | |
TargetDiff [119] | 3D equivariant diffusion model that can generate 3D molecular structures befitting a query protein binding site, together with a binding affinity estimation |
PILOT [120] | 3D equivariant diffusion model that can generate 3D molecular structures befitting a query protein binding site (while maintaining high synthetic accessibility), together with a binding affinity estimation |
Pocket2Mol [121] | E(3)-equivariant generative network that consists of a GNN generating 3D molecular structures befitting a query protein binding site (while maintaining drug properties such as drug likeness and synthetic accessibility) and a sampling algorithm that helps sample structures conditioned on the query pocket representation |
FRAME [124] | Series of SE(3)-equivariant generative networks capable of generating 3D molecular structures befitting a query protein binding site. From a starting molecule, the architecture selects locations on which to add certain molecular fragments to better fit a protein pocket in question, until a certain user-specified goal is reached (e.g., molecular weight) |
TacoGFN [122] | GFlowNet-based approach that can generate 3D molecular structures befitting a query protein binding site combined with a binding affinity estimation |
AHC [127] | Reinforced Genetic Algorithm employing neural models to build, evolve and optimize 2D molecular structures with binding affinity to a query protein through attempting to optimize a structure-explicit scoring function |
AutoGrow 4 [126] | Genetic Algorithm that evolves and optimizes 2D molecular structures from random seeds to compounds with binding affinity to a query protein through attempting to optimize a structure-explicit scoring function |