Top 10 Biorxiv Papers Today in Bioinformatics


0.0 Mikeys
#1. iterb-PPse: Identification of transcriptional terminators in bacterial by incorporating nucleotide properties into PseKNC
Yongxian Fan, Wanru Wang, Qingqi Zhu
Terminator is a DNA sequence that give the RNA polymerase the transcriptional termination signal. Identifying terminators correctly can optimize the genome annotation, more importantly, it has considerable application value in disease diagnosis and therapies. However, accurate prediction methods are deficient and in urgent need. Therefore, we proposed a prediction method  "iterb-PPse" for terminators by incorporating 47 nucleotide properties into PseKNC-? and PseKNC-? and utilizing Extreme Gradient Boosting to predict terminators based on Escherichia coli and Bacillus subtilis . Combing with the preceding methods, we employed three new feature extraction methods K-pssm, Base-content, Nucleotidepro to formulate raw samples. The two-step method was applied to select features. When identifying terminators based on optimized features, we compared five single models as well as 16 ensemble models. As a result, the accuracy of our method on benchmark dataset achieved 99.8%,higher than the existing state-of-the-art predictor iTerm-PseKNC...
more | pdf
Figures
Tweets
Github
Repository: myexperiment
User: Sarahyouzi
Language: Python
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7982
Unqiue Words: 2656

0.0 Mikeys
#2. DrawAlignR: An interactive tool for across run chromatogram alignment visualization
Shubham Gupta, Justin Sing, Arshia Mahmoodi, Hannes Rost
Multi-run alignment is widely used in proteomics to establish analyte correspondence across runs. Generally alignment algorithms return a cumulative score, which may not be easily interpretable for each peptide. Here we present a novel tool, DrawAlignR, to visualize each chromatographic alignment for DIA/SWATH data. Furthermore, we have developed a novel C++ based implementation of raw chromatogram alignment which is 35 times faster than the previously published algorithm. This not only enables users to plot alignment interactively by DrawAlignR, but also allows other software platforms to use the algorithm. DrawAlignR is an open-source web application using R Shiny that can be hosted using the source-code available at https://github.com/Roestlab/DrawAlignR.
more | pdf
Figures
Tweets
biorxivpreprint: DrawAlignR: An interactive tool for across run chromatogram alignment visualization https://t.co/jr4x3lvqab #bioRxiv
biorxiv_bioinfo: DrawAlignR: An interactive tool for across run chromatogram alignment visualization https://t.co/FlicQTIHOL #biorxiv_bioinfo
itsjeffreyy76: RT @biorxiv_bioinfo: DrawAlignR: An interactive tool for across run chromatogram alignment visualization https://t.co/FlicQTIHOL #biorxiv_…
hdeshmuk: RT @biorxiv_bioinfo: DrawAlignR: An interactive tool for across run chromatogram alignment visualization https://t.co/FlicQTIHOL #biorxiv_…
Github

An R package for the visualization of aligned ms2 chromatograms.

Repository: DrawAlignR
User: Roestlab
Language: R
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 3094
Unqiue Words: 1222

0.0 Mikeys
#3. Dynamical determinants of different spine movements and gait speeds in rotary and transverse gallops
Tomoya Kamimura, Shinya Aoi, Yasuo Higurashi, Naomi Wada, Kazuo Tsuchiya, Fumitoshi Matsuno
Quadruped gallop is categorized into two types: rotary and transverse. While the rotary gallop involves two types of flight with different spine movements, the transverse gallop involves only one type of flight. The rotary gallop can achieve faster locomotion than the transverse gallop. To clarify these mechanisms from a dynamic viewpoint, we developed a simple model and derived periodic solutions by focusing on cheetahs and horses. The solutions gave a criterion to determine the flight type: while the ground reaction force does not change the direction of the spine movement for the rotary gallop, it changes for the transverse gallop, which was verified with the help of animal data. Furthermore, the criterion provided the mechanism by which the rotary gallop achieves higher-speed than the transverse gallop based on the flight duration. These findings improve our understanding of the mechanisms underlying different gaits that animals use.
more | pdf
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#4. Multifaceted Analysis of Training and Testing Convolutional Neural Networks for Protein Secondary Structure Prediction
Maxim Shapovalov, Roland L. Dunbrack, Slobodan Vucetic
Protein secondary structure prediction remains a vital topic with improving accuracy and broad applications. By using deep learning algorithms, prediction methods not relying on structure templates were recently reported to reach as high as 87% accuracy on 3 labels (helix, sheet or coil). Due to lack of a widely accepted standard in secondary structure predictor development and evaluation, a fair comparison of predictors is challenging. A detailed examination of factors that contribute to higher accuracy is also lacking. In this paper, we present: (1) a new test set, Test2018, consisting of proteins from structures released in 2018 with less than 25% similar to any protein published before 2018; (2) a 4-layer convolutional neural network, SecNet, with an input window of ±14 amino acids which was trained on proteins less than 25% identical to proteins in Test2018 and the commonly used CB513 test set; (3) a detailed ablation study where we reverse one algorithmic choice at a time in SecNet and evaluate the effect on the prediction...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Multifaceted Analysis of Training and Testing Convolutional Neural Networks for Protein Secondary Structure Prediction https://t.co/GidAyJrGws #bioRxiv
biorxiv_bioinfo: Multifaceted Analysis of Training and Testing Convolutional Neural Networks for Protein Secondary Structure Prediction https://t.co/idJPT56jDX #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#5. A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets
Barış Ekim, Bonnie Berger, Yaron Orenstein
As the volume of next generation sequencing data increases, an urgent need for algorithms to efficiently process the data arises. Universal hitting sets (UHS) were recently introduced as an alternative to the central idea of minimizers in sequence analysis with the hopes that they could more efficiently address common tasks such as computing hash functions for read overlap, sparse suffix arrays, and Bloom filters. A UHS is a set of k-mers that hit every sequence of length L, and can thus serve as indices to L-long sequences. Unfortunately, methods for computing small UHSs are not yet practical for real-world sequencing instances due to their serial and deterministic nature, which leads to long runtimes and high memory demands when handling typical values of k (e.g. k > 13). To address this bottleneck, we present two algorithmic innovations to significantly decrease runtime while keeping memory usage low: (i) we leverage advanced theoretical and architectural techniques to parallelize and decrease memory usage in calculating k-mer...
more | pdf
Figures
Tweets
biorxivpreprint: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/beCmhMbd98 #bioRxiv
biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #biorxiv_bioinfo
ocxtal: RT @biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #…
pashadag: RT @biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #…
CamilleMrcht: RT @biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #…
BQPMalfoy: RT @biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #…
alos_31: RT @biorxiv_bioinfo: A randomized parallel algorithm for efficiently finding near-optimal universal hitting sets https://t.co/V29muNh9mY #…
Github

parallel algorithms for small hitting set approximations

Repository: pasha
User: ekimb
Language: C++
Stargazers: 1
Subscribers: 1
Forks: 1
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7824
Unqiue Words: 1831

0.0 Mikeys
#6. Robust Classification of Immune Subtypes in Cancer
David L Gibbs
As part of the 'immune landscape of cancer', six immune subtypes were defined which describe a categorization of tumor-immune states. A number of phenotypic variables were found to associate with immune subtypes, such as nonsilent mutation rates, regulation of immunomodulator genes, and cytokine network structures. An ensemble classifier based on XGBoost is introduced with the goal of classifying tumor samples into one of six immune subtypes. Robust performance was accomplished through feature engineering; quartile-levels, binary gene-pair features, and gene-set-pair features were computed for each sample independently. The classifier is robust to software pipeline and normalization scheme, making it applicable to any expression data format from raw count data to TPMs since the classification is essentially based on simple binary gene-gene level comparisons within a given sample. The classifier is available as an R package or part of the CRI iAtlas portal.
more | pdf
Figures
None.
Tweets
matthewndavies: RT @biorxiv_bioinfo: Robust Classification of Immune Subtypes in Cancer https://t.co/N1U2z7lOQ7 #biorxiv_bioinfo
CMehdi213: RT @biorxivpreprint: Robust Classification of Immune Subtypes in Cancer https://t.co/u082qyt5sN #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#7. Metalign: Efficient alignment-based metagenomic profiling via containment min hash
Nathan LaPierre, Mohammed Alser, Eleazar Eskin, David Koslicki, Serghei Mangul
Whole-genome shotgun sequencing enables the analysis of microbial communities in unprecedented detail, with major implications in medicine and ecology. Predicting the presence and relative abundances of microbes in a sample, known as "metagenomic profiling", is a critical first step in microbiome analysis. Existing profiling methods have been shown to suffer from poor false positive or false negative rates, while alignment-based approaches are often considered accurate but computationally infeasible. Here we present a novel method, Metalign, that addresses these concerns by performing efficient alignment-based metagenomic profiling. We use a containment min hash approach to reduce the reference database size dramatically before alignment and a method to estimate organism relative abundances in the sample by resolving reads aligned to multiple genomes. We show that Metalign achieves significantly improved results over existing methods on simulated datasets from a large benchmarking study, CAMI, and performs well on in vitro mock...
more | pdf
Figures
Tweets
soilmicrobe: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
TJesse62: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
claczny: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
pashadag: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
SaubashyaSur: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
CamilleMrcht: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
addyblanch: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
vallenet: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
seqwave: RT @biorxivpreprint: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/TxQBn5Q8ux #bioRxiv
GUILLAUMEGAUTRE: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
kevinltweets: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
ysknishimura: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
BartWeimersLab: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
jens_uwe_ulrich: RT @biorxiv_bioinfo: Metalign: Efficient alignment-based metagenomic profiling via containment min hash https://t.co/Eklhd58TmM #biorxiv_b…
Github

Metalign: efficient alignment-based metagenomic profiling via containment min hash

Repository: Metalign
User: nlapier2
Language: HTML
Stargazers: 1
Subscribers: 2
Forks: 0
Open Issues: 2
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 8167
Unqiue Words: 1927

0.0 Mikeys
#8. A hybrid spectral library combining DIA-MS data and a targeted virtual library substantially deepens the proteome coverage
Ronghui Lou, Pan Tang, Kang Ding, Shanshan Li, Cuiping Tian, Yunxia Li, Suwen Zhao, Yaoyang Zhang, Wenqing Shui
Data-independent acquisition mass spectrometry (DIA-MS) is a rapidly evolving technique that enables relatively deep proteomic profiling with superior quantification reproducibility. DIA data mining predominantly relies on a spectral library of sufficient proteome coverage that, in most cases, is built on data-dependent acquisition-based analysis of the same sample. To expand the proteome coverage for a pre-determined protein family, we report herein on the construction of a hybrid spectral library that supplements a DIA experiment-derived library with a protein family-targeted virtual library predicted by deep learning. Leveraging this DIA hybrid library substantially deepens the coverage of three transmembrane protein families (G protein coupled receptors; ion channels; and transporters) in mouse brain tissues with increases in protein identification of 37-87%, and peptide identification of 58-161%. Moreover, of the 412 novel GPCR peptides exclusively identified with the DIA hybrid library strategy, 53.6% were validated as...
more | pdf
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 9
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#9. NetMix: A network-structured mixture model for reduced-bias estimation of altered subnetworks
Matthew A Reyna, Uthsav Chitra, Rebecca Elyanow, Benjamin J Raphael
A classic problem in computational biology is the identification of altered subnetworks: subnetworks of an interaction network that contain genes/proteins that are differentially expressed, highly mutated, or otherwise aberrant compared to other genes/proteins. Numerous methods have been developed to solve this problem under various assumptions, but the statistical properties of these methods are often unknown. For example, some widely-used methods are reported to output very large subnetworks that are difficult to interpret biologically. In this work, we formulate the identification of altered subnetworks as the problem of estimating the parameters of a class of probability distributions which we call the Altered Subset Distribution (ASD). We derive a connection between a popular method, jActiveModules, and the maximum likelihood estimator (MLE) of the ASD. We show that the MLE is statistically biased , explaining the large subnetworks output by jActiveModules. We introduce NetMix, an algorithm that uses Gaussian mixture models...
more | pdf
Figures
None.
Tweets
vallenet: RT @biorxiv_bioinfo: NetMix: A network-structured mixture model for reduced-bias estimation of altered subnetworks https://t.co/qjLZLMROCg…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#10. A hierarchical clustering and data fusion approach for disease subtype discovery
Bastian Pfeifer, Michael G. Schimek
Recent advances in multi-omics clustering methods enable a more fine-tuned separation of cancer patients into clinical relevant clusters. These advancements have the potential to provide a deeper understanding of cancer progression and may facilitate the treatment of cancer patients. Here, we present a simple hierarchical clustering and data fusion approach, named HC-fused, for the detection of disease subtypes. Unlike other methods, the proposed approach naturally reports on the individual contribution of each single-omic to the data fusion process. We perform multi-view simulations with disjoint and disjunct cluster elements across the views to highlight fundamentally different data integration behaviour of various state-of-the-art methods. HC-fused combines the strengths of some recently published methods and shows good performance on real world cancer data from the TCGA (The Cancer Genome Atlas) database. An R implementation of our method is available on GitHub (pievos101/HC-fused).
more | pdf
Figures
Tweets
biorxivpreprint: A hierarchical clustering and data fusion approach for disease subtype discovery https://t.co/NB2zBB5pha #bioRxiv
biorxiv_bioinfo: A hierarchical clustering and data fusion approach for disease subtype discovery https://t.co/cDxKV3jq3K #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 7131
Unqiue Words: 1956

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 255,361 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 255,361 papers.