A beginner’s approach to deep learning applied to VS and MD techniques

D’Hondt, Stijn; Oramas, José; De Winter, Hans

doi:10.1186/s13321-025-00985-7

Journal of Cheminformatics

Table 3 Summary of DL models mentioned throughout the “Deep learning and virtual screening” section of this review used to aid in performing VS workflows

From: A beginner’s approach to deep learning applied to VS and MD techniques

DL-VS method/tool	Description
LBVS-type screening step
IVS2vec [34]	DFCNN that uses as input ligand compound vectors generated by Mol2vec (a ML method producing high-dimensional embeddings of molecular structures) and is capable of a binary protein classification: proteins with either a high or a low possibility of binding with a query ligand
DEEPScreen [36]	Collection of 704 CNNs, each an individual predictor of favorable interactions between a query protein and small molecule ligands
DeepScreening [37, 39, 40, 42]	Freely accessible web server capable of training DL models for either classification tasks or regression tasks. For classification tasks, DFCNNs perform binding probability predictions on a provided chemical library. For regression tasks, RNNs generate a de novo compound library and then perform binding probability predictions against a query protein
Drug repurposing DFCNN by Zhang et al. [43, 45]	DFCNN that uses as input ligand compound vectors generated by Mol2vec and that is capable of predicting protein–ligand binding probabilities. It uses only molecular and chemical information of the compound vectors, not considering spatial information
DeepBindBC [43, 45]	DFCNN that uses as input protein–ligand complex structures as generated by AutoDock Vina (thus considering spatial information) and that can estimate protein–ligand binding probabilities
Generative model for VS steps
GAN by Andrianov et al. [47]	GAN that consists of an AE encoder and DFCNN discriminator, that is able to generate molecular fingerprints of compounds similar to those from its training set, as to then identify comparable compounds from existing compound libraries for further use in a drug discovery workflow
LSTM RNN by Arshia et al. [53]	LSTM RNN retrained through DTL from a network called LSTM_Chem capable of capturing the features of SMILES molecular representations. It was retrained through 10 generations of refinements to learn to generate SMILES of unique, original and valid compounds, each generation with better binding affinity to a query protein
WAE by Das et al. [64]	A type of VAE with a GRU encoder and decoder, able to capture the features of short peptide sequences (max. 25 amino acids). Using this model’s latent space and four bidirectional LSTM classifier models, the architecture can generate diverse, valid AMPs with broad-spectrum potency and low toxicity, used for further in silico, in vitro and in vivo testing
Binding affinity predictor
DeepBindRG [81]	ResNet that uses as input 2D binding interface-related matrices of protein–ligand complexes and predicts their binding affinity
Pafnucy [79]	Model built of convolutional and dense layers, capable of using 4D input information (3D coordinates and an additional feature vector) to predict the binding affinity of protein–ligand complexes
AEV-PLIG [104]	Attention-based GNN model that uses as input protein–ligand interaction graphs to capture the interplay of interactions determining binding affinity and predict binding affinities for the query complexes
Pose predictor
EquiBind [106]	Combination of a graph matching network and GNN that uses as input protein–ligand complex graphs to perform one-shot predictions of the most optimal binding poses of query ligands in proteins (without binding affinity values)
TANKBind [107]	Similar GNN approach to EquiBind, using an additional bias parameter set to better prevent steric clashes and unrealistic conformations during the one-shot binding pose predictions. It also includes an additional module that allows for binding affinity predictions
DiffDock [110]	Diffusion generative model that starts with random conformations of a query ligand docked onto a protein and uses a reverse diffusion process to sample realistic protein–ligand binding poses and iteratively refine the system towards a most optimal final binding pose prediction
AlphaFold 3 [112]	Attention-based architecture capable of predicting the 3D structure of proteins with unknown tertiary and quaternary structures based on their amino acid sequence, as well as predict interactions with other proteins, small molecule ligands, nucleic acids, and modified or non-canonical residues
Generative model to replace VS steps
TargetDiff [119]	3D equivariant diffusion model that can generate 3D molecular structures befitting a query protein binding site, together with a binding affinity estimation
PILOT [120]	3D equivariant diffusion model that can generate 3D molecular structures befitting a query protein binding site (while maintaining high synthetic accessibility), together with a binding affinity estimation
Pocket2Mol [121]	E(3)-equivariant generative network that consists of a GNN generating 3D molecular structures befitting a query protein binding site (while maintaining drug properties such as drug likeness and synthetic accessibility) and a sampling algorithm that helps sample structures conditioned on the query pocket representation
FRAME [124]	Series of SE(3)-equivariant generative networks capable of generating 3D molecular structures befitting a query protein binding site. From a starting molecule, the architecture selects locations on which to add certain molecular fragments to better fit a protein pocket in question, until a certain user-specified goal is reached (e.g., molecular weight)
TacoGFN [122]	GFlowNet-based approach that can generate 3D molecular structures befitting a query protein binding site combined with a binding affinity estimation
AHC [127]	Reinforced Genetic Algorithm employing neural models to build, evolve and optimize 2D molecular structures with binding affinity to a query protein through attempting to optimize a structure-explicit scoring function
AutoGrow 4 [126]	Genetic Algorithm that evolves and optimizes 2D molecular structures from random seeds to compounds with binding affinity to a query protein through attempting to optimize a structure-explicit scoring function

This is a noncomprehensive list of available models, meant to inspire researchers of the range of techniques at their disposal. It is divided into (1) models used to perform LBVS-type screening steps, (2) generative models used to generate datasets for further VS steps, (3) models used to predict binding affinity values of complexes or (4) predict poses of ligands within query proteins, and (5) generative models used to entirely replace other VS techniques, generating molecules that fit within a binding pocket of interest

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com