The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers

Rihon, Jérôme; Reynders, Sten; Bernardes Pinheiro, Vitor; Lescrinier, Eveline

doi:10.1186/s13321-025-00977-7

Software
Open access
Published: 17 April 2025

The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers

Jérôme Rihon¹,
Sten Reynders¹,
Vitor Bernardes Pinheiro¹ &
…
Eveline Lescrinier¹

Journal of Cheminformatics volume 17, Article number: 53 (2025) Cite this article

441 Accesses
Metrics details

Abstract

Understanding of the structural and dynamic behaviour of molecules is a major objective in molecular modeling research. Sampling through the torsional space is an efficient way to map their behaviour. However, generating a landscape of possible conformations relies on multiple formalisms whose mathematics are often difficult to convert to code. Here we present a command line tool and a scripting module to provide the means to generate such landscapes with different axes according to various formalisms exploited for conformational sampling. Additionally to this toolkit, we apply a benchmarking study on subjecting a DNA nucleoside to a diverse set of quantum mechanical levels of theory for geometry optimisations and energy potential calculations. The potential of the tool is demonstrated on examples including amino acids and synthetic nucleosides having five-membered or six-membered sugar moieties.

Scientific contribution

We provide an open-source command line tool and library respectively to facilitate conformational sampling for peptide-like molecules, and five-membered and six-membered rings. The tool also includes methods to analyse the obtained results and to interconvert different puckering formalisms described in literature. Our benchmark to produce potential energy surfaces starting from conformational sampling allows the user to make informed decisions based on hardware availability and desired quality of the results. The endpoints can be used for model building and to develop force field parameters for structure predictions of polymeric structures.

Introduction

Function and regulation of biopolymers is heavily influenced by conformational changes to amino acids and nucleosides that make up large proteins and nucleic acid structures respectively [1]. It remains challenging to observe these changes directly at the atomic level due to their dynamic nature. Nonetheless, computer simulations have provided an increasingly realistic picture on the properties of studied molecules. Molecular Dynamics (MD) simulations are often used to understand their behaviour by allowing molecules to visit one or more conformational states when freely interacting with their environment, or by imposing them with a biased restraint to force less favourable states to occur as well during the simulation [2]. This behaviour is defined by a classical force field, whose parameters are typically obtained from research on particular moieties of the studied molecule, by either fitting to experimental data or using ab initio calculations with Quantum Mechanics (QM) [3, 4, 5]. The latter field allows us to study virtual fragments through computational approaches and eventually predicts the effect on the conformational behaviour of larger molecules that are composed of such fragments [6, 7]. Traditional force fields, used in the analysis of biopolymer structures and their interactions, rely on the quantitative descriptors of these monomer building blocks [8]. In the field of synthetic nucleic acid (NA) research, this can guide the selection of new constructs prior to their synthesis in the lab. The development of Xenobiotic Nucleic Acids (XNAs) advanced the field towards viable therapeutics [9]. Chemical modifications in the original DNA structure prolonged the biological half-life of oligonucleotides (ONs) to a level suited for clinical applications and interactions to the target were optimised to increase potency and selectivity [10]. Modifying the backbone and the nucleobase expanded the field of XNA research [11, 12, 13]. To date, progress was made by systematic approaches of chemical synthesis and evaluating the viability of a multitude of different XNA constructs. Eschenmoser [14] Molecular modeling can exploit in silico results and steer the selection of next generation XNA based therapeutics. A clearcut tool is needed to facilitate obtaining a free energy landscape for ring puckering of new nucleotides that can be used to derive force field parameters, which are required for MD simulations on larger constructs [15, 16, 17].

Here, we present the pucke.rs toolkit that generates the axes needed to produce the energy landscape of a molecule through Conformational Sampling (CS). We demonstrate the toolkit with the adenosine nucleoside (dA), its 1,5-anhydrohexitol analogue (HNA) and the alanine amino acid - using QM to fully characterise the free energy landscapes by quantifying their potential wells and transitions between different states. The axes provide a set of constraints for all possible conformations of a molecular type, which are applied in geometry optimisation (GO) procedures in QM, in order to produce a set of optimised conformers. The potential energy of the optimised conformers are then calculated to obtain the energy levels of data points on the different landscapes (Single Point Evaluation, SPE). Generating the axes for a specific landscape is accessible from the command line (pucke.rs) as well as a neat Python3 module (pucke.py). The latter additionally provides means to calculate for different formalisms (Cremer-Pople [18], Altona-Sundaralingam [19] or Strauss-Pickett [20]), and it can use puckering coordinates to regenerate 3D structures of five- and six-membered rings (covering most frequent XNA chemical modifications). The module can also be used for assessing monomers in a polymer structure, thus enabling the determination of context-dependent conformations. The fast Conformational Sampling of monomers does not impair the accuracy of model assembly and it establishes modeling as a tool for discovery of novel XNAs.

Methods

Generating initial structures

The CS methodology consists of three parts, starting with the generation of torsion angle constraints to produce initial conformations, which cover the full conformational landscape. The pucke.rs CLI-tool provides one-liner queries to generate the desired axes of the landscape in order to sample the conformational landscape. The Python module (pucke.py) contains this functionality in the confsampling module and works analogously.

Using either tools requires the user to specify the type of molecular system (peptide, five-membered ring or six-membered ring) to be called (Figure 7). Both the peptide and five-membered system employ a linear space function, which asks the user to pass a number of points to be computed for in a preset range.

For the example at the end of the manuscript (Figure 6), the peptide landscape was queried to produce the axes ($\phi$, $\psi$) at an interval of 10$^\circ$ to generate 1369 distinct sets of ($\phi$, $\psi$)-constraints, for the inclusive range of $[0. \rightarrow 360.]$. The L-Alanine (Me-NH-Ala-CO-Me) was used to sample the peptide space (Figure 1A.) and did not require additional constraints. The axes were later transformed to $[-180. \rightarrow 180]$ for visual comparison of the energetically favourable regions of the peptide backbone, with the popular Ramachandran plot [21].

For the dA ribose ring, the inclusive range $[-60. \rightarrow 60.]$ was divided in 21 segments to allow a sampling at 6$^\circ$ intervals on the $Z_x$ and $Z_y$ axes, defined by Huang et al. [6]. For each generated data point, pucke.rs generates a set of ($\nu _1$, $\nu _3$)-constraints (Figure 1B.). Additional exocyclic constraints for the GO procedures were applied to generate five-membered ring conformers for the CS experiment : $\beta$: 208.5$^\circ$, $\gamma$: 30.9$^\circ$, $\epsilon$: 159.1$^\circ$ and $\chi$: 260.6$^\circ$.

For conformational sampling of a six-membered ring, a set of data points is generated on the surface of the Cremer-Pople (CP) sphere that represents their conformational space. The CP formalism lets us invert from spherical coordinates to atomic coordinates, which are then used by the Strauss-Pickett formalism to convert this information to impromper dihedrals ($\alpha _1$, $\alpha _2$, $\alpha _3$). These are used as constraints to generate initial structures in the CS methodology (Figure 1C.) The hexitol NA adenosine (hA) monomer sampling was imposed with the additional exocyclic constraints: $\beta$: 180.1$^\circ$, $\gamma$: 60.0$^\circ$, $\epsilon$: 180.1$^\circ$ and $\chi$: 210.59$^\circ$.

Queries of pucke.rs and pucke.py are given in Figure 7A–B.

Quantum Mechanics

The CS methodology is subjected to various computational chemistry approaches to assess consumption of resources and quality of their results. Each initial structure is subjected to a constrained Geometry Optimisation (GO) and a Single Point Evaluation (SPE) through QM approaches. Here the DNA adenosine nucleoside is used for benchmarking since its Potential Energy Surface (PES) is well described [6, 7]. This study utilises the computational chemistry package ORCA v5.0.4 [22], as this version applies the latest correction on the D4 dampening [23, 24] (kindly provided by the Grimme lab). While ORCA was used to perform standard geometry optimisations and single point energy calculations, any computational chemistry package that accepts additional constraints during their geometry optimisation procedure is compatible with this toolkit, as the produced constraints are printed to stdout. The user can then manipulate the output for their desired workflow.

The Gold Standard Quality (GSQ) was decided to be the Møller-Plesset 2^nd order perturbation theory (MP2) [25]. The CI-CCSD(T) level of theory (LoT), generally considered one of the most accurate methods, does not lend itself well to geometry optimisations. The GSQ is accompagnied by the 6-311++G (2df,2p) basis [26, 27], with the Resolution of Identity (RI) approximation [28]. The def2-QZVPP/C auxiliary basisset [29] is used for the RI approximation of the MP2 density, together with the def2/JK for approximation on Coulombic and Exchange integrals [30], hereafter MP2$^{Q}$. The same basis set and approximations are used for the CS methodology at the ab initio Hartree-Fock (HF) level (HF$^{Q}$).

The semi-empirical HF-3c LoT [31, 32, 33] has been used in the accelerated methodology [7] before due to its cost efficiency and is employed here to compare against the other methods. The double hybrid functional PBE0 [34], with the D4 dampening, uses the same basis as MP2$^{Q}$ but with the def2/J auxiliary set [35] (PBEO$^{Q}$). Both HF-3c and PBEO$^{Q}$ have shown to output optimised structures of great quality for a fraction of the cost of the pure wave function theory methods. All GOs are performed with the VeryTightOpt keyword.

Other methods are the MP2 def2-TZVP/C (MP2$^{T}$) [29] and the HF$^{Q}$ without RIJK approximation . These are used to compare their consumables and quality of results within the LoT with the other variants.

The dA molecule, which counts 31 atoms in total, uses 103 basis functions for HF-3c and 742 basis functions for the calculations with the PBEO$^{Q}$, the HF$^{Q}$ and the MP2$^{Q}$ LoT respectively. The benchmarking consists of comparing different protocols’ resources and assessing their RAM usage, wallclock time and Disk Space usage (the consumables) used during and after computations, and comparing this to the GSQ. Comparison of the GO quality will be done by going through a pairwise structure comparison, where differences are measured with the Kabsch RMSD algorithm [36] (https://github.com/charnley/rmsd). All sets of optimised geometries will also be subjected to an SPE with all four methodologies [HF-3c, PBEO$^{Q}$, HF$^{Q}$, MP2$^{Q}$].

The MAXCORE keyword is utilised to max out at 1500 MiB per thread engaged. For the GO part, every optimisation allocates six threads per conformation. A total of ten conformations, at one time, can be concurrently optimised. For the SPE, every evaluation allocates one thread per conformation with a total of 35 threads active at one time. Calculations were performed on a Ryzen ThreadRipper 3970 (32 cores / 64 threads) with a RAM capacity of 64 GiB.

Potential energy surface

From a landscape of in silico generated and evaluated conformers of a molecular type, a PES is generated, as described by Mattelaer et al. [7]. The PES itself is expressed as the relative difference in energy ($\Delta E$) of all conformations with respect to the global minimum of the landscape (Figure 3). Differences in relative energy ($\Delta \Delta E$) between the calculated PES and GSQ are used to compare the different LoTs presented. The $\Delta \Delta E$ and RMSD maps are used to identify which combinations (GO-SPE) can best represent the optimal PES (Figure 4).

Figures are made with Matplotlib and Cartopy. Figure 14 details on the conventions used to define the CP six-membered ring space in relations to the mathematical convention of defining latitude-longitude coordinates, for graphical purposes. Cartopy, a Python geography library superset of Matplotlib, was exploited to project the PES of six-membered ring systems onto the surface of the sphere. This was done with the Mollweide projection, and transforming the data by the PlateCarree projection. Because Q tends to stabilise around 0.67 for biologically relevant puckering modes, the CP coordinates $(Q, \phi _2, \theta )$ were simplified to 2D to better graphically visualise the CP sphere, by neglecting the amplitude. The oslo colour scheme was used [37] for the RMSD contour maps.

Results and discussion

Generating the axes using the pucke.rs toolkit

In the first stage of generating a conformational landscape, the pucke.rs toolkit is used to select a grid system for a set of axes and to produce a set of geometry optimisation constraints for each grid point. These can then be imposed on the respective dihedrals to generate different conformations of the molecule that are required to construct the landscape.

The constraints of the peptide-like landscapes are simply procured by iterating over the backbone $\phi$-$\psi$ dihedral angles in a nested fashion, resulting in a 2D grid system with $\phi$-$\psi$ axes. The values gathered from iterating over these axes are directly used as the constraints of the particular dihedrals. Each grid point corespons to a set of $\phi$-$\psi$ dihedrals to be used as constraints for GO procedures (Figure 1A.)

For ring systems, puckering formalisms are exploited, as they neatly abstract the conformation of an N-membered ring system to a set of coordinates. For the five-membered ring system, the methodology has been applied from Huang et al.’s [6] way of combining the Altona-Sundaralingam (AS) and Sato formalism, projected on a Cartesian system with ($Z_x, Z_y$) axes.

$$\begin{aligned} \begin{aligned} \nu _1&= \left( Z_x cos(\frac{4\pi }{5})) + (Z_y sin(\frac{4\pi }{5})\right) \\ \nu _3&= \left( Z_x cos(\frac{4\pi }{5})) - (Z_y sin(\frac{4\pi }{5})\right) \end{aligned} \end{aligned}$$

(1)

By iterating over a set of $Z_x$ and $Z_y$ values, ranging from $[-60. \rightarrow 60.]$, a set of $(\nu _1, \nu _3)$ endocyclic torsion angles per grid point is calculated according to Equation 1, which was rearranged from Huang et al. to return the pair of endocyclic torsion angles [6]. The five-membered ring method returns a 2D grid, composed of sets of proper dihedrals to be used as constraints for GO procedures (Figure 1B.).

Sampling the six-membered ring space exploits two puckering formalisms. Through the use of the Cremer-Pople (CP) formalism, one can calculate a set of local elevations from a spherical coordinate ($Q, \theta, \phi$), which is an abstraction of a six-membered ring conformation. It has been theoretically detailed by Cremer [38] and applied by Sega et al. [39], to reverse engineer (or invert) the puckering coordinates to a full conformation. Starting from an equidistributed globe [40], the coordinates are passed into the function to calculate the set of local elevations per conformation. Based on assumptions for the magnitude of the bond lengths and bond angles, the atoms are assigned a position in $\mathrm{I\!R^3}$. Next, the improper dihedrals ($\alpha _1,\alpha _2,\alpha _3$) (Strauss-Pickett formalism; SP) are computed for and are used as constraints. The sphere represents the CP sphere (Figure 1C.). The amplitude is kept as a constant at $Q = 0.67$, as this is the value at which biologically relevant six-membered rings exist [41]. Equation 2 defines all ring systems with an even amount of atoms.

$$\begin{aligned} z_j = \sqrt{\frac{2}{N}} q_m cos\left( \phi _m + (2\pi m \frac{j-1}{N})\right) + \frac{1}{\sqrt{N}} q_{(m+1)} (-1^{j-1}) \end{aligned}$$

(2)

For six-membered ring systems ($N = 6$, resulting in $m = 2$), the equation is simplified.

$$\begin{aligned} z_j = \sqrt{\frac{1}{3}} q_2 cos\left( \phi _2 + (2\pi \frac{j-1}{3})\right) + \frac{1}{\sqrt{6}} q_3 (-1^{j-1}) \end{aligned}$$

(3)

By definition, Equation 3 can be assigned as :

$$\begin{aligned} q_2 = Q sin(\theta ) , \quad q_3 = Q cos(\theta ), \quad \phi = \phi _2 \end{aligned}$$

(4)

Which results in Equation 5, that is used to perform the actual computation for the set of local elevations in a six-membered ring system (iterating over j = $0 \rightarrow 5$) in the software. The generated spherical coordinates $(Q, \theta , \phi )$ are passed to this function.

$$\begin{aligned} z_j = \left[ \sqrt{\frac{1}{3}} sin(\theta ) cos\left( \phi + ( \frac{2\pi j}{3})\right) + \frac{1}{\sqrt{6}} cos(\theta ) (-1^{j})\right] Q \end{aligned}$$

(5)

The pucke.py module

In order to make the methodology more user-friendly, we provide both a CLI tool (pucke.rs, Figure 7A.) and a scripting library (pucke.py, a Python-wrapped Rust library) to allow users to implement either tool into their own workflows. The Rust language was a deliberate choice to ensure the robustness of the toolkit.

In the Python module, the CLI-tool has been implemented as the confsampling module (Figure 7D.). Additionally, the pucke.py module contains the formalism module that allows the user to calculate various puckering formalisms for five- (AS, CP5) and six-membered (SP, CP6) rings. Furthermore, the user can pass specific Cremer-Pople coordinates to the CP5(r, $\phi _2$) or CP6(r, $\phi _2$, $\theta$) class and invert these parameters to produce the 3D structure they have defined, as an xyz- or pdb-formatted file (Figure 2). This feature allows users to explore and understand the intricacies of the different formalisms, as well as derive specific constraints by the queried conformer (Figure 7B.-C.).

Additionally, the geometry module is supplemented with three functions to calculate molecular geometries (bond length, bond angle and torsion angle) by passing in coordinates from parsed molecule files. The library provides classes to manipulate pdb and xyz coordinate files, which parse the coordinates of the molecule in question (Figure 7E.). Various code examples on the pucke.py module are given in Figure 7 and on the GitHub repository.

The construction of a PES for 2'-deoxyadenosine

By evaluating the relative potential energy of the individual conformers in such a conformational landscape, represented by grid points, a potential energy surface is obtained. To demonstrate its applicability, conformational sampling was performed on a DNA nucleoside through geometry optimisation and single point evaluation procedures of the generated conformers. To aid researchers in applying the CS methodology, the pucke.rs tool is used in the construction of a PES. A comparison has been included using a selection of geometry optimisation and single point evaluation procedures at different LoTs in quantum mechanics.

Benchmarking on a local machine

A set of levels of theory [HF-3c, PBEO$^{Q}$, HF$^{Q}$ and MP2$^{Q}$] for the geometry optimisation procedure was applied on adenosine. Every set of optimised structures was then also subjected to potential energy evaluations by the four LoTs respectively. This generated sixteen different landscapes at various levels of accuracy to describe the behaviour of the DNA nucleoside (Figure 3). Obtained PESs were compared with Table 1 and the Consumables (Figure 8).

The selected GSQ, MP2$^{Q}$, were the heaviest computations, clocking in at roughly 548h or about 22.8 days of calculations. The geometry optimisation capped out at about 48 GiB of RAM, while at most 40 GiB of tmp-files were stored on disk by ORCA when ten conformations were optimised concurrently. The HF-3c was logged for the same parameters and finished in about 0.7h, capping at almost 3GiB of RAM and almost 1 GiB in Disk Space in tmp-files. The GO experiment with HF$^{Q}$ finished around the 30h mark and showed little hardware consumption compared to the GSQ, topping at 10 GiB of RAM with an excess of 6 GiB of tmp-files produced by ORCA at most. The PBEO$^{Q}$ consumed about the same amount as the HF$^{Q}$, but clocked in at 58h or 2.4 days (Figure 8A.,B.).

Table 1 Wallclock time to completion, with reference to the generated PESs from Figure 3. Chronometric data expressed in hours (h). The diagonal of the table highlights the wallclock time for only GO procedures, as SPE here are redundant

Full size table

When looking at Table 1, we saw that the SPE calculations ran at most a total of six hours to optimise all possible adenosine conformers. The HF$^{Q}$ (max. 28.9 GiB) and MP2$^{Q}$ (max. 28.6 GiB) had similar RAM requirements, while the latter still required a tremendous amount of free disk space to store tmp-files. All this in contrast to the PBEO$^{Q}$ that needed (max. 9 GiB) of additional space on disk to run succesfully. The SPE at the HF-3c consumed a maximum of 6 GiB and ran to completion in under ten minutes (Figure 8C,D). The conclusion remains that the (HF-3c - MP2$^{Q}$) combination still is a robust competitor in approximating the GSQ, as these calculations are perfectly manageable within a single workday, on hardware with 64 GiB of RAM and 32 cores. The established protocol by Mattelaer et al. [7] demonstrated high accuracy, used half of the resources and only 1% of the runtime with respect to the GSQ. To combat the storing of tmp-files, one can allocate more RAM per thread.

To balance the resources for all the LoTs, a middle ground was sought to not exaggerate the resources allocated to the cheaper methods (HF-3c, PBEO$^{Q}$), while avoiding bottlenecking the expensive methods. This made comparing consumables more straightforward. These protocols, which are targetted for studies on organic compounds, can be further optimised along the available computational hardware of the researcher. A comparison between (MP2$^{Q}$ vs. MP2$^{T}$ ) and (HF$^{Q}$ vs. ) is presented in Figure 9. To evaluate the quality of the different PESs, a closer look to the differences of individual results is required. Relative to the GSQ, we selected the PESs of the bottom row (Figure 3) for evaluation. A first analysis was done by applying the difference in relative energy to that of the GSQ ($\Delta \Delta E$) (Figure 4A.). To look at the quality of the optimised conformations, we assessed every conformation of a landscape in a pairwise fashion to the conformer with the same puckering coordinates in the GSQ and applied an RMSD algorithm [36], to calculate the difference in optimised structures (Figure 4B.). We highlight that the respective ranges in which we evaluate both the $\Delta \Delta E$ and the RMSD are small, indicating that the differences overall are minute. The PBEO$^{Q}$ best approximated the MP2$^{Q}$ for 11% of its runtime and optimised conformations closest in resemblance to that of the GSQ in parts of the landscape where it matters (e.g. minima and saddle points). Its only downside is the time investment into the PBEO$^{Q}$ calculation, with respect to HF-3c, as the PBEO$^{Q}$ took at least two days to run for such a landscape, while HF-3c barely ran for an hour, when all optimisations were run concurrently. The SPE calculations at the PBEO$^{Q}$ level are a strong alternative where hardware systems are limited as the landscapes at this level resemble the shape of the MP2$^{Q}$ SPE the most. Figure 10 compares all SPE calculations to the GSQ ($\Delta \Delta E$) and Figure 11 compares all optimised geometries from the various LoTs with one another, for a holistic depiction of the results of this experiment.

Comparison with literature data

A second, useful application of the pucke.rs toolkit lies in the analysis of reported puckering coordinates in literature. Here, an article is discussed that set the foundation for all modern AMBER force fields [8].

To introduce the Cremer-Pople formalism, an apex (first atom in the set) needs to be assigned for the set of local elevations ($z_j$). The original paper [18] conveniently chose to have the apex to go through the oxygen atom when characterising sugars, which results in {$\phi _2 = 0^\circ$} returning an $^{O'}$E conformation. While comparing this data with what is parametrised in the paper of Cornell et al. [8] (AMBER force field parameters), it was noticed that while the CP formalism was assigned to denote the puckering modes of the conformers by the authors, the reported pucker coordinates do not follow the same apex. More so, the apex of the formalism used in Cornell et al. seems to follow through the C3$'$, since the $^{3'}$E is closest to 0$^{\circ }$, but the phase angle has also been shifted by 18 $^{\circ }$ ($\frac{\pi }{10}$), causing the $^{3'}$T$_{ 2'}$ to appear at the top of the plot. This, however, does exactly align with the AS formalism [19], instead of the Cremer-Pople formalism.

Figure 5D depicts the reported conformers used from the Cornell paper [8] and adjacently the nearest pucker coordinate from the previous CS experiment (Figure 3, MP2$^{Q}$), as the AS formalism. Of note is the broad interpretation of the Envelope ranges, as they actually lean closer into Twist-territory. Figure 13 is a recreation of the conformations by their reported coordinates in the paper, using the invert method in pucke.py and highlights the software’s practicality in research. The [$^{2'}$E, $^{3'}$E, $^{O'}$E, E$_{O'}$] set of conformers are coupled to a relative energy value. From the GSQ $\Delta E$ values, these are (0.00, 1.66, 2.93, 4.65). In the Cornell paper, these are respectively at [$\epsilon =1$ : (0.00, 0.63, 2.87, 5.86)] and [$\epsilon =4$ : (0.00, 1.04, 1.86, 5.68)]. Any differences are attributed to the basis set used (6-31G$^*$) and the constraints at which the geometry optimisation was performed. We see the same trend of favourability in potential energy of the conformations in all three sets of results. The parametrised conformers also fall into place with the local and global minimum and the transitional states, showing the predictive quality of the CS methodology on the behaviour of these monomers.

For the puckering behaviour of the furanose in DNA, we see a global minimum around the $^{2'}$E area, which corresponds with standard DNA::DNA homoduplex configurations. At the local minimum, we find the $^{3'}$E conformer, which is often adopted under conditions when hybridising with different types of backbone chemistries The PES depicts two transition states, or commonly referred to as saddle points. The upper saddle point (5, 40) locates the $^{O'}$E conformation, while the lower saddle point at around (5, -15) depicts the E$_{O'}$ (Figure 5A). Again, this predicted behaviour of the DNA nucleoside falls in line with structural determination data. Figure 15A.,B. gives an overview of five-membered ring pucker modes.

Sampling of the peptide and six-membered ring systems

To finalise, we provide a brief example of the applicability of the CS methodology, by applying it to generate a PES for examples on the peptide and six-membered rings experiments. The methodology is generalisable to any five-membered and six-membered ring atomic system as well as any peptide-like system (two consecutive sp$^3$-hybridised torsion angles). These molecular systems encompass all chemical variants of biological monomers used in standard and synthetic biology.

The peptide PES (Figure 6A.) shows a global minimum around the same region as where we would find alpha-helical conformers on a standard Ramachandran plot [43]. Up from this global well, we see one local maximum and an adjacent local minimum ($\phi$ = -45$^\circ$ $\rightarrow$ -135$^\circ$, $\psi$ = ±135$^\circ$ ). This region constitutes where beta-sheet conformers are situated. On the right-hand half of the PES, diagonally up from the alpha-helical conformers, we encounter the alpha$_L$-helical conformers, whom are involved in left-handed helical protein structures. This behaviour tends to be true for all natural amino acids [43, 21].

Finally, the HNA chemistry has been subjected to a sampling. At the North Pole, we find the typical Chair ($^X$C$_W$) conformers (example in Figure 2B.). At its antipode, the South Pole, we find their inverse conformers ($^W$C$_X$). At the equator, we find various Boats ($^{X,W}$B, B$_{X,W}$) (example in Figure 2B.) and Skews ($^X$S$_Z$, $^Z$S$_X$) configurations. Around a latitude of 55$^\circ$, we encounter the Envelope ($^X$E, E$_Y$) and Twist ($^X$T$_Y$) (example in Figure 2B.) puckering modes, while at 135$^\circ$ we see their inverse puckering modes ($^Y$E, E$_X$; $^Y$T$_X$) (Figure 15C.,D.). From this PES, we can safely assess the stability of both North and South pole conformations for the HNA molecule, and we see three large local minima around the equator. Viewing the minima from left to right, it boasts the $^{3'}$S$_{1'}$, the B$_{O',3'}$ and the $^{O'}$S$_{2'}$ respectively.

Conclusion

Molecular dynamics simulations have proven to be useful to gain insight and predict how changes in the backbone impact on the complementation properties of proteins and (synthetic) nucleic acids. Accurate parametrisation of the force field used in molecular dynamics is a prerequisite to obtain reliable results. Available tools to perform force field parametrisation rely on transitional pathways for conformational changes in fragments of the biomolecules that can be derived from a PES. The CS metholodogy has been used previously to generate QM-based force field parameters for molecular dynamics simulations within AMBER for XNA with an alternative backbone chemistry (morpholino NA [16], threose NA [15] and HNA [17]). Here we introduce the pucke.rs CLI and pucke.py library that streamline the CS metholodogy for calculation of PESs for amino acids, five-membered and six-membered ring systems. These tools should facilitate research on creating new parameters and to optimise established force fields. The pucke.rs toolkit and the Conformational Sampling methodology synergise with the Ducque model builder [16] (https://www.github.com/jrihon/Ducque), that allows the user to implement a custom repository of new nucleoside chemistries. Nucleoside conformers that function as building blocks for the virtual XNA duplexes can be curated from the generated landscape.

This work also demonstrates how different levels of theory in computational chemistry perform, both in terms of qualitative output and by logging their consumables, allowing the user to make informed decisions for their own experiments and what their hardware allows them to do. While this work based itself on the accelerated methodology of Mattelaer et al. [7], the goal was to explore the methodologies a researcher can use for this type of experiments.

The free and open-source toolkit allows for a pragmatic approach to these experiments, by simplifying the workflow to what is actually desired; the sampling, defining and documenting of the configurational space of the biomolecular monomer of interest, as small molecules or in polymeric structures. No direct implementation exists between pucke.rs and ORCA, allowing the user to employ their QM package of choice for the CS methodology, with which one can then generate QM-based based force field parameters for molecular dynamics simulations. The inversion method for the Cremer-Pople formalism is particularly useful to recreate conformations from literature, to reproduce and understand puckering behaviour.

Availability of data and materials

First releases of pucke.py and pucke.rs can found through the FigShare link (https://figshare.com/s/e79632bb91ddb904b390, 10.6084/m9.figshare.26078641). Newer versions can be found on the respective GitHub repositories.

References

Lescrinier E, Froeyen M, Herdewijn P (2003) Difference in conformational diversity between nucleic acids with a six-membered ‘sugar’unit and natural ‘furanose’nucleic acids. Nucleic Acids Res 31(12):2975–2989
Article CAS PubMed PubMed Central Google Scholar
Babin V, Roland C, Darden TA et al (2006) The free energy landscape of small peptides as obtained from metadynamics with umbrella sampling corrections. J Chem Phys 125:20
Article Google Scholar
Pérez A, Marchán I, Svozil D et al (2007) Refinement of the amber force field for nucleic acids: Improving the description of $\alpha$/$\gamma$ conformers. Biophys J 92(11):3817–3829. https://doiorg.publicaciones.saludcastillayleon.es/10.1529/biophysj.106.097782
Article CAS PubMed PubMed Central Google Scholar
Zgarbová M, Otyepka M, Šponer J et al (2011) Refinement of the cornell et al nucleic acids force field based on reference quantum chemical calculations of glycosidic torsion profiles. J Chem Theor Comput 7(9):2886–2902. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/ct200162x
Article CAS Google Scholar
Zgarbová M, Šponer J, Otyepka M et al (2015) Refinement of the sugar-phosphate backbone torsion beta for amber force fields improves the description of z- and b-dna. J Chem Theor Comput 11(12):5723–5736. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/acs.jctc.5b00716
Article CAS Google Scholar
Huang M, Giese TJ, Lee TS et al (2014) Improvement of DNA and RNA sugar pucker profiles from semiempirical quantum methods. J Chem Theory Comput 10(4):1538–1545. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/ct401013s
Article CAS PubMed PubMed Central Google Scholar
Mattelaer CA, Mattelaer HP, Rihon J et al (2021) Efficient and accurate potential energy surfaces of puckering in sugar-modified nucleosides. J Chem Theory Comput 6:3814
Article Google Scholar
Cornell WD, Cieplak P, Bayly CI et al (1995) A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. J Am Chem Soc 117(19):5179–5197. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/ja00124a002
Article CAS Google Scholar
Gait MJ, Agrawal S (2022) Introduction and history of the chemistry of nucleic acids therapeutics. Springer, US
Book Google Scholar
Egli M, Manoharan M (2023) Chemistry, structure and function of approved oligonucleotide therapeutics. Nucleic Acids Res 51(6):2529–2573. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkad067
Article CAS PubMed PubMed Central Google Scholar
Groaz E, Herdewijn P (2023) Hexitol nucleic acid (HNA): From chemical design to functional genetic polymer. Handbook of Chemical Biology of Nucleic Acids. Springer Nature Singapore, Cham. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/978-981-16-1313-5_15-1
Chapter Google Scholar
Pinheiro VB, Taylor AI, Cozens C et al (2012) Synthetic genetic polymers capable of heredity and evolution. Science 336(6079):341–344. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.1217622
Article CAS PubMed PubMed Central Google Scholar
Yang H, Eremeeva E, Abramov M et al (2023) CRISPR-cas9 recognition of enzymatically synthesized base-modified nucleic acids. Nucleic Acids Res 51(4):1501–1511. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkac1147
Article CAS PubMed PubMed Central Google Scholar
Eschenmoser A (2004) The tna-family of nucleic acid systems: Properties and prospects. Orig Life Evol Biosph 34(3):277–306. https://doiorg.publicaciones.saludcastillayleon.es/10.1023/b:orig.0000016450.59665.f4
Article CAS PubMed Google Scholar
Reynders S, Rihon J, Lescrinier E (2025) Molecular modeling on duplexes with threose-based tna and tphona reveals structural basis for different hybridization affinity toward complementary natural nucleic acids. J Chem Theory Comput. https://doiorg.publicaciones.saludcastillayleon.es/10.1021/acs.jctc.4c01316
Article PubMed Google Scholar
Rihon J, Mattelaer CA, Montalvão RW et al (2024) Structural insights into the morpholino nucleic acid/rna duplex using the new xna builder ducque in a molecular modeling pipeline. Nucleic Acids Res 52(6):2836–2847. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkae135
Article CAS PubMed PubMed Central Google Scholar
Schofield P, Taylor AI, Rihon J et al (2023) Characterization of an hna aptamer suggests a non-canonical g-quadruplex motif. Nucleic Acids Res 51(15):7736–7748. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/nar/gkad592
Article CAS PubMed PubMed Central Google Scholar
Cremer D, Pople J (1975) General definition of ring puckering coordinates. J Am Chem Soc 97(6):1354–1358
Article CAS Google Scholar
Altona C, Sundaralingam M (1972) Conformational analysis of the sugar ring in nucleosides and nucleotides new description using the concept of pseudorotation. J Am Chem Soc 94(23):8205–8212
Article CAS PubMed Google Scholar
Strauss HL, Pickett HM (1970) Conformational structure, energy, and inversion rates of cyclohexane and some related oxanes. J Am Chem Soc 92(25):7281–7290
Article CAS Google Scholar
Rosenberg AA, Yehishalom N, Marx A et al (2023) An amino-domino model described by a cross-peptide-bond ramachandran plot defines amino acid pairs as local structural units. Proc Natl Acad Sci USA. https://doiorg.publicaciones.saludcastillayleon.es/10.1073/pnas.2301064120
Article PubMed PubMed Central Google Scholar
Neese F, Wennmohs F, Becker U et al (2020) The orca quantum chemistry program package. J Chem Phys 152(22):224108
Article CAS PubMed Google Scholar
Caldeweyher E, Bannwarth C, Grimme S (2017) Extension of the d3 dispersion coefficient model. J Chem Phys. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.4993215
Article PubMed Google Scholar
Caldeweyher E, Ehlert S, Hansen A et al (2019) A generally applicable atomic-charge dependent london dispersion correction. J Chem Phys. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.5090222
Article PubMed Google Scholar
Cremer D (2011) Møller-plesset perturbation theory: from small molecule methods to methods for thousands of atoms. WIREs Comput Mol Sci 1(4):509–530. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/wcms.58
Article CAS Google Scholar
Frisch MJ, Pople JA, Binkley JS (1984) Self-consistent molecular orbital methods 25 supplementary functions for gaussian basis sets. J Chem Phys 80(7):3265–3269. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.447079
Article CAS Google Scholar
Krishnan R, Binkley JS, Seeger R et al (1980) Self-consistent molecular orbital methods .xx. a basis set for correlated wave functions. J Chem Phys 72(1):650–654. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.438955
Article CAS Google Scholar
Neese F (2003) An improvement of the resolution of the identity approximation for the formation of the coulomb matrix. J Comput Chem 24(14):1740–1747. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jcc.10318
Article CAS PubMed Google Scholar
Hellweg A, Hättig C, Höfener S et al (2007) Optimized accurate auxiliary basis sets for ri-mp2 and ri-cc2 calculations for the atoms rb to rn. Theor Chem Acc 117(4):587–597. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00214-007-0250-5
Article CAS Google Scholar
Weigend F (2007) Hartree-fock exchange fitting basis sets for h to rn. J Comput Chem 29(2):167–175. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jcc.20702
Article CAS Google Scholar
Grimme S, Antony J, Ehrlich S et al (2010) A consistent and accurate ab initio parametrization of density functional dispersion correction (dft-d) for the 94 elements h-pu. J Chem Phys. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.3382344
Article PubMed Google Scholar
Grimme S, Ehrlich S, Goerigk L (2011) Effect of the damping function in dispersion corrected density functional theory. J Comput Chem 32(7):1456–1465. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jcc.21759
Article CAS PubMed Google Scholar
Kruse H, Grimme S (2012) A geometrical correction for the inter- and intra-molecular basis set superposition error in hartree-fock and density functional theory calculations for large systems. J Chem Phys. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.3700154
Article PubMed Google Scholar
Adamo C, Barone V (1999) Toward reliable density functional methods without adjustable parameters: The pbe0 model. J Chem Phys 110(13):6158–6170. https://doiorg.publicaciones.saludcastillayleon.es/10.1063/1.478522
Article CAS Google Scholar
Weigend F (2006) Accurate coulomb-fitting basis sets for h to rn. Phys Chem Chem Phys 8(9):1057. https://doiorg.publicaciones.saludcastillayleon.es/10.1039/b515623h
Article CAS PubMed Google Scholar
Kabsch W (1976) A solution for the best rotation to relate two sets of vectors. Acta Cryst Sect A 32(5):922–923. https://doiorg.publicaciones.saludcastillayleon.es/10.1107/s0567739476001873
Article Google Scholar
Crameri F (2023) Scientific colour maps (8.0.1). Zenodo https://doiorg.publicaciones.saludcastillayleon.es/10.5281/zenodo.8409685,
Cremer D (1990) Calculation of puckered rings with analytical gradients. J Phys Chem 94:5502–5509
Article CAS Google Scholar
Sega M, Autieri E, Pederiva F (2011) Pickett angles and cremer–pople coordinates as collective variables for the enhanced sampling of six-membered ring conformations. Mol Phys 109(1):141–148. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/00268976.2010.522208
Article CAS Google Scholar
Deserno M (2004) How to generate equidistributed points on the surface of a sphere. Max-Planck-Institut für Polymerforschung, Ackermannweg 10, 55128 Mainz, Germany https://www.cmu.edu/biolphys/deserno/pdf/sphere_equi.pdf
Haasnoot C (1992) The conformation of six-membered rings described by puckering coordinates derived from endocyclic torsion angles. J Am Chem Soc 114(3):882–887
Article CAS Google Scholar
Pettersen EF, Goddard TD, Huang CC et al (2004) Ucsf chimera-a visualization system for exploratory research and analysis. J Comput Chem 25(13):1605–1612. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/jcc.20084
Article CAS PubMed Google Scholar
Hollingsworth SA, Karplus PA (2010) A fresh look at the ramachandran plot and the occurrence of standard structures in proteins. BioMol Concepts 1(3–4):271–283. https://doiorg.publicaciones.saludcastillayleon.es/10.1515/bmc.2010.022
Article CAS PubMed PubMed Central Google Scholar
Agirre J (2017) Strategies for carbohydrate model building, refinement and validation. Acta Cryst Sect D Struct Biol 73(2):171–186. https://doiorg.publicaciones.saludcastillayleon.es/10.1107/s2059798316016910
Article CAS Google Scholar
Biochemical Nomenclature (JCBN) IIJC, (1983) Abbreviations and symbols for the description of conformations of polynucleotide chains. Eur J Biochem 131:9–15
Article Google Scholar
Rings SM (1980) Conformational nomenclature for five and six-membered ring forms of monosaccharides and their derivatives. Eur J Biochem 100:295–298
Google Scholar

Download references

Funding

This project was funded on project grant G085321N of Research Foundation - Flanders (FWO) (to J.R. and S.R.), project Grant C14/19/102 of KU Leuven Research fund (to E.L. and V.B.P.).

Author information

Authors and Affiliations

Laboratory of Medicinal Chemistry, Departement of Pharmaceutical and Pharmacological Sciences, Rega Institute for Medical Research, KU Leuven, Herestraat 49, 3000, Leuven, Belgium
Jérôme Rihon, Sten Reynders, Vitor Bernardes Pinheiro & Eveline Lescrinier

Authors

Jérôme Rihon
View author publications
You can also search for this author inPubMed Google Scholar
Sten Reynders
View author publications
You can also search for this author inPubMed Google Scholar
Vitor Bernardes Pinheiro
View author publications
You can also search for this author inPubMed Google Scholar
Eveline Lescrinier
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

J.R. and E.L wrote the main manuscript.J.R. prepared all figures, programmed both software tools, carried out the main experiment. S.R. helped test both software tools.All authors reviewed the manuscript.

Corresponding author

Correspondence to Eveline Lescrinier.

Ethics declarations

Competing interests

The research group declares no conflicts of interest. The https://github.com/jrihon/puckers and the https://github.com/jrihon/puckepy are available on GitHub, where their documentation can be found as well. Installation procedures are given in the respective repositories, available for all major operating systems. Both tools are written in Rust, with pucke.py available as a Python library, and function under the MIT license. The Rust toolchain (cargo) solves dependencies, no Python dependencies required.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Appendix

Examples of CLI queries and the Python module

Consumables by the various levels of theory; GO and SPE.

Comparison of specifications within the MP2 and HF level of theory respectively.

$\Delta \Delta$E of all GO sampling respective to MP2-optimised structures.

RMSD of all GO sampling respective to MP2-optimised structures.

Inversion protocol of five- and six-ring systems

Inverted conformation of the “AMBER 2^ndgeneration FF” paper (Cornell et al.) [8]

Definitions CP-sphere vs. Mollweide Projection

Puckering configurations and definitions

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Rihon, J., Reynders, S., Bernardes Pinheiro, V. et al. The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers. J Cheminform 17, 53 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-025-00977-7

Download citation

Received: 21 June 2024
Accepted: 25 February 2025
Published: 17 April 2025
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s13321-025-00977-7

The pucke.rs toolkit to facilitate sampling the conformational space of biomolecular monomers

Abstract

Scientific contribution

Introduction

Methods

Generating initial structures

Quantum Mechanics

Potential energy surface

Results and discussion

Generating the axes using the pucke.rs toolkit

The pucke.py module

The construction of a PES for 2'-deoxyadenosine

Benchmarking on a local machine

Comparison with literature data

Sampling of the peptide and six-membered ring systems

Conclusion

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Supplementary Information

Additional file 1.

Appendix

Appendix

Examples of CLI queries and the Python module

Consumables by the various levels of theory; GO and SPE.

Comparison of specifications within the MP2 and HF level of theory respectively.

\(\Delta \Delta\)E of all GO sampling respective to MP2-optimised structures.

RMSD of all GO sampling respective to MP2-optimised structures.

Inversion protocol of five- and six-ring systems

Inverted conformation of the “AMBER 2ndgeneration FF” paper (Cornell et al.) [8]

Definitions CP-sphere vs. Mollweide Projection

Puckering configurations and definitions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Journal of Cheminformatics

Contact us

Inverted conformation of the “AMBER 2^ndgeneration FF” paper (Cornell et al.) [8]