Top 10 Biorxiv Papers Today in Bioinformatics


2.169 Mikeys
#1. TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements
Fabio Navarro, Jacob Hoops, Lauren Bellfy, Eliza Cerveira, Qihui Zhu, Chengsheng Zhang, Charles Lee, Mark Gerstein
Long interspersed nuclear element 1 (LINE-1) is a primary source of genetic variation in humans and other mammals. Despite its importance, LINE-1 activity remains difficult to study because of its highly repetitive nature. Here, we developed and validated a method called TeXP to gauge LINE-1 activity accurately. TeXP builds mappability signatures from LINE-1 subfamilies to deconvolve the effect of pervasive transcription from autonomous LINE-1 activity. In particular, it apportions the multiple reads aligned to the many LINE-1 instances in the genome into these two categories. Using our method, we evaluated well-established cell lines, cell-line compartments and healthy tissues and found that the vast majority (91.7%) of transcriptome reads overlapping LINE-1 derive from pervasive transcription. We validated TeXP by independently estimating the levels of LINE-1 autonomous transcription using ddPCR, finding high concordance. Next, we applied our method to comprehensively measure LINE-1 activity across healthy somatic cells, while...
more | pdf
Figures
None.
Tweets
biorxivpreprint: TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements https://t.co/IGqvAlwYEZ #bioRxiv
biorxiv_bioinfo: TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements https://t.co/SWFCRxTJqH #biorxiv_bioinfo
razoralign: TeXP: Deconvolving the effects of pervasive and autonomous transcription of transposable elements https://t.co/uiTYIv0T1c https://t.co/DAYf52FqJR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 8
Total Words: 0
Unqiue Words: 0

2.071 Mikeys
#2. APEC: An accesson-based method for single-cell chromatin accessibility analysis
Bin Li, Young Li, Kun Li, Lianbang Zhu, Qiaoni Yu, Jingwen Fang, Pengfei Cai, Chen Jiang, Kun Qu
The development of sequencing technologies has promoted the survey of genome-wide chromatin accessibility at single-cell resolution; however, comprehensive analysis of single-cell epigenomic profiles remains a challenge. Here, we introduce an accessibility pattern-based epigenomic clustering (APEC) method, which classifies each individual cell by groups of accessible regions with synergistic signal patterns termed "accessons". By integrating with other analytical tools, this python-based APEC package greatly improves the accuracy of unsupervised single-cell clustering for many different public data sets. APEC also identifies significant differentially accessible sites, predicts enriched motifs, and projects pseudotime trajectories. Furthermore, we developed a fluorescent tagmentation- and FACS-sorting-based single-cell ATAC-seq technique named ftATAC-seq and investigated the per cell regulome dynamics of mouse thymocytes. Associated with ftATAC-seq, APEC revealed a detailed epigenomic heterogeneity of thymocytes, characterized the...
more | pdf
Figures
Tweets
biorxivpreprint: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/EpiVewblNW #bioRxiv
biorxiv_bioinfo: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/KWnf6mQdwJ #biorxiv_bioinfo
razoralign: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/Kz64SSk47j https://t.co/djYPSoTLMb
razoralign: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/Kz64SSk47j https://t.co/CmCMywvoMO
svheeringen: RT @biorxiv_bioinfo: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/KWnf6mQdwJ #biorxiv_bioi…
genolib_19: RT @biorxiv_bioinfo: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/KWnf6mQdwJ #biorxiv_bioi…
sbotlite: RT @biorxivpreprint: APEC: An accesson-based method for single-cell chromatin accessibility analysis https://t.co/EpiVewblNW #bioRxiv
Github

Single cell epigenomic clustering based on accessibility pattern

Repository: APEC
User: QuKunLab
Language: Python
Stargazers: 4
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : [3]
Authors: 9
Total Words: 10639
Unqiue Words: 3607

2.03 Mikeys
#3. SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble
Ruth Huh, Yuchen Yang, Yuchao Jiang, Yin Shen, Yun Li
Clustering is an essential step in the analysis of single cell RNA-seq (scRNA-seq) data to shed light on tissue complexity including the number of cell types and transcriptomic signatures of each cell type. Due to its importance, novel methods have been developed recently for this purpose. However, different approaches generate varying estimates regarding the number of clusters and the single-cell level cluster assignments. This type of unsupervised clustering is challenging and it is often times hard to gauge which method to use because none of the existing methods outperform others across all scenarios. We present SAME-clustering, a mixture model-based approach that takes clustering solutions from multiple methods and selects a maximally diverse subset to produce an improved ensemble solution. We tested SAME-clustering across 15 scRNA-seq datasets generated by different platforms, with number of clusters varying from 3 to 15, and number of single cells from 49 to 32,695. Results show that our SAME-clustering ensemble method...
more | pdf
Figures
None.
Tweets
razoralign: SAME-clustering: Single-cell Aggregated Clustering via Mixture Model Ensemble https://t.co/QgpqAgiwTu https://t.co/m1Ud4zSS11
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.026 Mikeys
#4. Novel Rhabdovirus and an almost complete drain fly transcriptome recovered from two independent contaminations of clinical samples.
Francisco Brito, Mose Manni, Florian Laubscher, Manuel Schibler, Mary-Anne Hartley, Kristina Keitel, Tarsis Mlaganile, Valerie d'Acremont, Samuel Cordey, Laurent Kaiser, Evgeny M Zdobnov
Metagenomic approaches enable an open exploration of microbial communities without requiring a priori knowledge of a sample's composition by shotgun sequencing the total RNA or DNA of the sample. Such an approach is valuable for exploratory diagnostics of novel pathogens in clinical practice. Yet, one may also identify surprising off-target findings. Here we report a mostly complete transcriptome from a drain fly (likely Psychoda alternata) as well as a novel Rhabdovirus-like virus recovered from two independent contaminations of RNA sequencing libraries from clinical samples of cerebral spinal fluid (CSF) and serum, out of a total of 724 libraries sequenced at the same laboratory during a 2-year time span. This drain fly genome shows a considerable divergence from previously sequenced insects, which may obscure common clinical metagenomic analyses not expecting such contaminations. The classification of these contaminant sequences allowed us to identify infected drain flies as the likely origin of the novel Rhabdovirus-like...
more | pdf
Figures
Tweets
biorxivpreprint: Novel Rhabdovirus and an almost complete drain fly transcriptome recovered from two independent contaminations of clinical samples. https://t.co/7quhIrrh18 #bioRxiv
biorxiv_bioinfo: Novel Rhabdovirus and an almost complete drain fly transcriptome recovered from two independent contaminations of clinical samples. https://t.co/jzrcJqRZ2l #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 11
Total Words: 5178
Unqiue Words: 2141

2.025 Mikeys
#5. Studying 3D genome evolution using genomic sequence
Raphael Mourad
The 3D genome is essential to numerous key processes such as the regulation of gene expression and the replication-timing program. In vertebrates, chromatin looping is often mediated by CTCF, and marked by CTCF motif pairs in convergent orientation. Comparative Hi-C recently revealed that chromatin looping evolves across species. However, Hi-C experiments are complex and costly, which currently limits their use for evolutionary studies over a large number of species. Here, we propose a novel approach to study the 3D genome evolution in vertebrates using the genomic sequence only, e.g. without the need for Hi-C data. The approach is simple and relies on comparing the distances between convergent and divergent CTCF motifs (ratio R). We show that R is a powerful statistic to detect CTCF looping encoded in the human genome sequence, thus reflecting strong evolutionary constraints encoded in DNA and associated with the 3D genome. When comparing vertebrate genomes, our results reveal that R which underlies CTCF looping and TAD...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Studying 3D genome evolution using genomic sequence https://t.co/lq0ouwLumn #bioRxiv
biorxiv_bioinfo: Studying 3D genome evolution using genomic sequence https://t.co/2v47OrBnan #biorxiv_bioinfo
3D_Genome: RT @biorxiv_bioinfo: Studying 3D genome evolution using genomic sequence https://t.co/2v47OrBnan #biorxiv_bioinfo
hdeshmuk: RT @biorxiv_bioinfo: Studying 3D genome evolution using genomic sequence https://t.co/2v47OrBnan #biorxiv_bioinfo
param_p_singh: RT @biorxiv_bioinfo: Studying 3D genome evolution using genomic sequence https://t.co/2v47OrBnan #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

2.005 Mikeys
#6. In Silico Benchmarking of Metagenomic Tools for Coding Sequence Detection Reveals the Limits of Sensitivity and Precision
Jonathan L Golob, Samuel S Minot
High-throughput sequencing can establish the functional capacity of a microbial community by cataloging the protein-coding sequences (CDS) present in the metagenome of the community. The relative performance of different computational methods for identifying CDS from whole-genome shotgun sequencing (WGS) is not fully established. Here we present an automated benchmarking workflow, using synthetic shotgun sequencing reads for which we know the true CDS content of the underlying communities, to determine the relative performance (sensitivity, positive predictive value or PPV, and computational efficiency) of different metagenome analysis tools for extracting the CDS content of a microbial community. Assembly-based methods are limited by coverage depth, with poor sensitivity for CDS at < 5X depth of sequencing, but have excellent PPV. Mapping-based techniques are more sensitive at low coverage depths, but can struggle with PPV. We additionally describe an expectation maximization based iterative algorithmic approach which we show to...
more | pdf
Figures
None.
Tweets
Github

Functional Analysis of Metagenomes by Likelihood Inference

Repository: FAMLI
User: FredHutch
Language: Jupyter Notebook
Stargazers: 8
Subscribers: 3
Forks: 3
Open Issues: 1
Youtube
None.
Other stats
Sample Sizes : [100, 20, 5]
Authors: 2
Total Words: 6101
Unqiue Words: 2154

2.004 Mikeys
#7. SciBet: An ultra-fast classifier for cell type identification using single cell RNA sequencing data
Chenwei Li, Baolin Liu, Boxi Kang, Zedao Liu, Yedan Liu, Xianwen Ren, Zemin Zhang
Robust computational and statistical methods are needed for the analysis and interpretation of single cell data, especially for discrimination and annotation of different cell types. Applying an entropy statistic to supervised gene selection, we built SciBet (Single Cell Identifier Based on Entropy Test), a Bayesian classifier that accurately predicts cell identity for any randomly sequenced cell. We demonstrate that SciBet outperforms existing tools in accuracy, robustness, speed and scalability.
more | pdf
Figures
None.
Tweets
razoralign: SciBet: An ultra-fast classifier for cell type identification using single cell RNA sequencing data https://t.co/6SITGd8vkb https://t.co/PDUjJ1HK0K
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

2.003 Mikeys
#8. Viral quasispecies reconstruction via contig abundance estimation in variation graphs
Jasmijn Baaijens, Leen Stougie, Alexander Schoenhuth
Viral quasispecies assembly aims to reconstruct all mutant strains populating an infected patient and to provide corresponding abundance estimates. We provide a reference-genome-independent solution based on the construction of a variation graph, capturing all quasispecies diversity present in the sample. We solve the contig abundance estimation problem and propose a greedy algorithm to efficiently build full-length haplotypes. Finally, we obtain accurate frequency estimates for the reconstructed haplotypes through linear programming techniques. Our method outperforms state-of-the-art approaches in viral quasispecies assembly and has the potential to assemble bacterial genomes in a strain aware manner as well.
more | pdf
Figures
Tweets
mehrshmali: RT @biorxiv_bioinfo: Viral quasispecies reconstruction via contig abundance estimation in variation graphs https://t.co/qdxO1MyjZz #biorxi…
pavel_avdeyev: RT @biorxiv_bioinfo: Viral quasispecies reconstruction via contig abundance estimation in variation graphs https://t.co/qdxO1MyjZz #biorxi…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 9722
Unqiue Words: 2685

1.998 Mikeys
#9. Equitable Thresholding and Clustering (ETAC): A novel method for FMRI clustering in AFNI
Robert W Cox
This paper describes a hybrid method to threshold FMRI group statistical maps derived from voxelwise second-level statistical analyses. The proposed "Equitable Thresholding and Clustering" (ETAC) approach seeks to reduce the dependence of clustering results on arbitrary parameter values by using multiple sub-tests, each equivalent to a standard FMRI clustering analysis, to make decisions about which groups of voxels are potentially significant. The union of these sub-test results decides which voxels are accepted. The approach adjusts the cluster-thresholding parameter of each sub-test in an equitable way, so that the individual false positive rates (FPRs) are balanced across sub-tests to achieve a desired final FPR (e.g., 5%). ETAC utilizes resampling methods to estimate the FPR, and thus does not rely on parametric assumptions about the spatial correlation of FMRI noise. The approach was validated with pseudo-task timings in resting state brain data. Additionally, a task FMRI data collection was used to compare ETAC's true...
more | pdf
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 8362
Unqiue Words: 2274

1.998 Mikeys
#10. Empirically-Derived Synthetic Populations to Mitigate Small Sample Sizes
Erin E Fowler, Anders Berglund, Michael J. Schell, Thomas A Sellers, Steven Eschrich, John Heine
Limited sample sizes can hinder biomedical research and lead to spurious findings. The objective of this work is to present a new method to generate synthetic populations (SPs) from sparse data samples to aid in modeling developments. Matched case-control data (n=180 pairs) defined the limited samples. Cases and controls were considered as two separate limited samples. Synthetic populations were generated for these observed samples using multivariate unconstrained bandwidth kernel density estimations. We included four continuous variables and one categorical variable for each individual. Bandwidth matrices were determined with Differential Evolution (DE) optimization driven by covariance comparisons. Four synthetic samples (n=180) were constructed from their respective SP for comparison purposes. Similarity between the observed samples with equally sized synthetic-samples was compared under the hypothesis that their sample distributions were the same. Distributions were compared with the maximum mean discrepancy (MMD) test...
more | pdf
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : [180, 180, 180, 180, 180]
Authors: 6
Total Words: 9491
Unqiue Words: 2730

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 131,277 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 131,277 papers.