Predicting how proteins interact with one another - that is, which surfaces
of one protein bind to which surfaces of another protein - is a central problem
in biology. Here we present Siamese Atomic Surfacelet Network (SASNet), the
first end-to-end learning method for protein interface prediction. Despite
using only spatial coordinates and identities of atoms as inputs, SASNet
outperforms state-of-the-art methods that rely on complex, hand-selected
features. These results are particularly striking because we train the method
entirely on a significantly biased data set that does not account for the fact
that proteins deform when binding to one another. Nonetheless, our network
maintains high performance, without retraining, when tested on real cases in
which proteins do deform. This suggests that it has learned fundamental
properties of protein structure and dynamics, which has important implications
for a variety of key problems related to biomolecular structure.

Let Sn denote the network of all RNA secondary structures of length n, in
which undirected edges exist between structures s, t such that t is obtained
from s by the addition, removal or shift of a single base pair. Using
context-free grammars, generating functions and complex analysis, we show that
the asymptotic average degree is O(n) and that the asymptotic clustering
coeffcient is O(1/n), from which it follows that the family Sn, n = 1,2,3,...
of secondary structure networks is not small-world.

Computational procedures to foresee the 3D structure of aptamers are in
continuous progress. They constitute a crucial input to research, mainly when
the crystallographic counterpart of the structures in silico produced is not
present. At now, many codes are able to perform structure and binding
prediction, although their ability in scoring the results remains rather weak.
In this paper, we propose a novel procedure to complement the ranking outcomes
of free docking code, by applying it to a set of anti-angiopoietin aptamers,
whose performances are known. We rank the in silico produced configurations,
adopting a maximum likelihood estimate, based on their topological and
electrical properties. From the analysis, two principal kinds of conformers are
identified, whose ability to mimick the binding features of the natural
receptor is discussed. The procedure is easily generalizable to many biological
biomolecules, useful for increasing chances of success in designing
high-specificity biosensors (aptasensors).

Specific protein-protein interactions are crucial in most cellular processes.
They enable multi-protein complexes to assemble and to remain stable, and they
allow signal transduction in various pathways. Functional interactions between
proteins result in coevolution between the interacting partners, and thus in
correlations between their sequences. Pairwise maximum-entropy based models
have enabled successful inference of pairs of amino-acid residues that are in
contact in the three-dimensional structure of multi-protein complexes, starting
from the correlations in the sequence data of known interaction partners.
Recently, algorithms inspired by these methods have been developed to identify
which proteins are specific interaction partners among the paralogous proteins
of two families, starting from sequence data alone. Here, we demonstrate that a
slightly higher performance for partner identification can be reached by an
approximate maximization of the mutual information between the sequence
alignments of the two protein families....

All known terrestrial proteins are coded as continuous strings of ~20 amino
acids. The patterns formed by the repetitions of elements in groups of finite
sequences describes the natural architectures of protein families. We present a
method to search for patterns and groupings of patterns in protein sequences
using a mathematically precise definition for 'repetition', an efficient
algorithmic implementation and a robust scoring system with no adjustable
parameters. We show that the sequence patterns can be well-separated into
disjoint classes according to their recurrence in nested structures. The
statistics of pattern occurrences indicate that short repetitions are enough to
account for the differences between natural families and randomized groups by
more than 10 standard deviations, while patterns shorter than 5 residues are
effectively random. A small subset of patterns is sufficient to account for a
robust ''familiarity'' definition of arbitrary sets of sequences.

RNA forms elaborate secondary structures through intramolecular base pairing.
These structures perform critical biological functions within each cell. Due to
the availability of a polynomial algorithm to calculate the partition function
over these structures, they are also a suitable system for the statistical
physics of disordered systems. In this model, below the denaturation
temperature, random RNA secondary structures exist in one of two phases: a
strongly disordered, low-temperature glass phase, and a weakly disordered,
high-temperature molten phase. The probability of two bases to pair decays with
their distance with an exponent 3/2 in the molten phase, and about 4/3 in the
glass phase. Inspired by previous results from a renormalized field theory of
the glass transition separating the two phases, we numerically study this
transition. We introduce distinct order parameters for each phase, that both
vanish at the critical point. We finally explore the driving mechanism behind
this transition.

We construct a one-bead-per-residue coarse-grained dynamical model to
describe intrinsically disordered proteins at significantly longer timescales
than in the all-atom models. In this model, inter-residue contacts form and
disappear during the course of the time evolution. The contacts may arise
between the sidechains, the backbones or the sidechains and backbones of the
interacting residues. The model yields results that are consistent with many
all-atom and experimental data on these systems. We demonstrate that the
geometrical properties of various homopeptides differ substantially in this
model. In particular, the average radius of gyration scales with the sequence
length in a residue-dependent manner.

Motivation: Drug discovery demands rapid quantification of compound-protein
interaction (CPI). However, there is a lack of methods that can predict
compound-protein affinity from sequences alone with high applicability,
accuracy, and interpretability.
Results: We present a seamless integration of domain knowledges and
learning-based approaches. Under novel representations of
structurally-annotated protein sequences, a semi-supervised deep learning model
that unifies recurrent and convolutional neural networks has been proposed to
exploit both unlabeled and labeled data, for jointly encoding molecular
representations and predicting affinities. Our representations and models
outperform conventional options in achieving relative error in IC50 within
5-fold for test cases and 10-fold for protein classes not included for
training. Performances for new protein classes with few labeled data are
further improved by transfer learning. Furthermore, an attention mechanism is
embedded to our model to add to its interpretability, as...

We investigated frictional effects on the folding rates of a human Telomerase
hairpin (hTR HP) and H-type pseudoknot from the Beet Western Yellow Virus (BWYV
PK) using simulations of the Three Interaction Site (TIS) model for RNA. The
heat capacity from TIS model simulations, calculated using temperature replica
exchange simulations, reproduces nearly quantitatively the available
experimental data for the hTR HP. The corresponding results for BWYV PK serve
as predictions. We calculated the folding rates ($k_\mathrm{F}$s) from more
than 100 folding trajectories for each value of the solvent viscosity ($\eta$)
at a fixed salt concentration of 200 mM. Using the theoretical estimate
($\propto\sqrt{N}$ where $N$ is number of nucleotides) for folding free energy
barrier, $k_\mathrm{F}$ data for both the RNAs are quantitatively fit using one
dimensional Kramers' theory with two parameters specifying the curvatures in
the unfolded basin and the barrier top. In the high-friction regime
($\eta\gtrsim10^{-5}\,\textrm{Pa s}$), for both HP and...

Protein-peptide interactions play essential roles in many cellular processes
and their structural characterization is the major focus of current
experimental and theoretical research. Two decades ago, it was proposed to
employ the steered molecular dynamics to assess the strength of protein-peptide
interactions. The idea behind using steered molecular dynamics simulations is
that the mechanical stability can be used as a promising and an efficient
alternative to computationally highly demanding estimation of binding affinity.
However, mechanical stability defined as a peak in force-extension profile
depends on the choice of the pulling direction. Here we propose an uncommon
choice of the pulling direction along resultant dipole moment vector, which has
not been explored in simulations so far. Using explicit solvent all-atom MD
simulations, we apply steered molecular dynamics technique to probe mechanical
resistance of protein-peptide system pulled along two different vectors. A
novel pulling direction, along the resultant dipole...

