Top 10 Biorxiv Papers Today in Bioinformatics


2.063 Mikeys
#1. Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution
Julia Marchetti, Alexander Monzon, Silvio C.E. Tosatto, Gustavo Parisi, Maria Silvina Fornasari
Inter-residue contacts determine the structural properties for each conformer in the ensembles describing the native state of proteins. Structural constraints during evolution could then provide biologically relevant information about the conformational ensembles and their relationship with protein function. Here, we studied the proportion of sites evolving under structural constraints in two very different types of ensembles, those coming from ordered or disordered proteins. Using a structurally constrained model of protein evolution we found that both types of ensembles show comparable, near 40%, number of positions evolving under structural constraints. Among these sites, ~68% are in disordered regions and ~57% of them show long-range inter-residue contacts. Also, we found that disordered ensembles are redundant in reference to their structurally constrained evolutionary information and could be described on average with ~11 conformers. Despite the different complexity of the studied ensembles and proteins, the similar...
more | pdf
Figures
Tweets
biorxivpreprint: Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution https://t.co/NjVK9QzoXb #bioRxiv
biorxiv_bioinfo: Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution https://t.co/72dX9NUmHC #biorxiv_bioinfo
AlexanderMonzon: Look at our latest pre-print about "Ensembles from ordered and disordered proteins reveal similar structural constraints during evolution" #proteindisorder #ensembles #evolution https://t.co/zQN8XYVRvl
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 6006
Unqiue Words: 1822

2.037 Mikeys
#2. Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave
Can Kockan, Kaiyuan Zhu, Natnatee Dokmai, Nikolai Karpov, Oguzhan Kulekci, David Woodruff, Cenk Sahinalp
Current practices in collaborative genomic data analysis (e.g. PCAWG) necessitate all involved parties to exchange individual patient data and perform all analysis locally, or use a trusted server for maintaining all data to perform analysis in a single site (e.g. the Cancer Genome Collaboratory). Since both approaches involve sharing genomic sequence data - which is typically not feasible due to privacy issues, collaborative data analysis remains to be a rarity in genomic medicine. In order to facilitate efficient and effective collaborative or remote genomic computation we introduce SkSES (Sketching algorithms for Secure Enclave based genomic data analysiS), a computational framework for performing data analysis and querying on multiple, individually encrypted genomes from several institutions in an untrusted cloud environment. Unlike other techniques for secure/privacy preserving genomic data analysis, which typically rely on sophisticated cryptographic techniques with prohibitively large computational overheads, SkSES utilizes...
more | pdf
Figures
Tweets
biorxivpreprint: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave https://t.co/lOszdJKXY5 #bioRxiv
biorxiv_bioinfo: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave https://t.co/ZzrF3M5Wlr #biorxiv_bioinfo
calkan_cs: Cool, @ckockan_cs. But you know SGX is hacked, right? Does it affect your "success"? https://t.co/Sy9zvkJ7gc
razoralign: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave: https://t.co/oB48UkShhr
TheFirstNuomics: https://t.co/ydEk1zKxOD
nomad421: RT @biorxiv_bioinfo: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave https://t.co/ZzrF3M5Wlr #biorxiv_bioi…
mtanichthys: RT @biorxiv_bioinfo: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave https://t.co/ZzrF3M5Wlr #biorxiv_bioi…
TaherMun: RT @biorxiv_bioinfo: Sketching Algorithms for Genomic Data Analysis and Querying in a Secure Enclave https://t.co/ZzrF3M5Wlr #biorxiv_bioi…
Github
Repository: sgx-genome-variants-search
User: ndokmai
Language: C++
Stargazers: 0
Subscribers: 2
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 9325
Unqiue Words: 2629

2.032 Mikeys
#3. Sparse Binary Relation Representations for Genome Graph Annotation
Mikhail Karasikov, Harun Mustafa, Amir Joudaki, Sara Javadzadeh No, Gunnar Rätsch, André Kahles
High-throughput DNA sequencing data is accumulating in public repositories, and efficient approaches for storing and indexing such data are in high demand. In recent research, several graph data structures have been proposed to represent large sets of sequencing data and allow for efficient query of sequences. In particular, the concept of colored de Bruijn graphs has been explored by several groups. While there has been good progress towards representing the sequence graph in small space, methods for storing a set of labels on top of such graphs are still not sufficiently explored. It is also currently not clear how characteristics of the input data, such as the sparsity and correlations of labels, can help to inform the choice of method to compress the labels. In this work, we present a systematic analysis of five different state-of-the-art annotation compression schemes that evaluates key metrics on both artificial and real-world data and discusses how different data characteristics influence the compression performance. In...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/IzWtjZVzc3 #bioRxiv
biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
razoralign: Multi-BRWT: Sparse Binary Relation Representations for Genome Graph Annotation: https://t.co/COOCLKplWn
francois_sabot: RT @biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
GuillaumOleSan: RT @biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
GUILLAUMEGAUTRE: RT @biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
hdeshmuk: RT @biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
TaherMun: RT @biorxiv_bioinfo: Sparse Binary Relation Representations for Genome Graph Annotation https://t.co/oV2qUHQ92R #biorxiv_bioinfo
Github

Sparse Binary Relation Representations for Genome Graph Annotation

Repository: genome_graph_annotation
User: ratschlab
Language: C++
Stargazers: 0
Subscribers: 15
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 7931
Unqiue Words: 2625

2.017 Mikeys
#4. Embracing the dropouts in single-cell RNA-seq data
Peng Qiu
One primary reason that makes the analysis of single-cell RNA-seq data challenging is the dropouts, where the data only captures a small fraction of the transcriptome of each cell. Many computational algorithms developed for single-cell RNA-seq adopted gene selection and dimension reduction strategies to address dropout. Here, an opposite view is explored. Instead of treating dropout as a problem to be fixed, we embrace it as a useful signal for defining cell types. We present an iterative co-occurrence clustering algorithm that works with binarized single-cell RNA-seq count data. Surprisingly, although all the quantitative information is removed after the data is binarized, co-occurrence clustering of the binarized data is able to effectively identify cell populations, as well as cell-type specific pathways and signatures. We demonstrate that the binary dropout patterns of the data provides not only overlapping but also complementary information compared to the quantitative gene expression counts in single-cell RNA-seq data.
more | pdf
Figures
Tweets
dblipka1: A different approach to #scRNA-seq analysis: Embracing the dropouts in single-cell RNA-seq data https://t.co/Ds8bLkPRoT
4130chromo: Embracing the dropouts in single-cell RNA-seq data https://t.co/fEtxDtb3d0
tSILIChtetL: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
Blood_Buff: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
salazarbiol: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
nafiz_h: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
melancronico: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
Sivico26: RT @biorxivpreprint: Embracing the dropouts in single-cell RNA-seq data https://t.co/w8NW9EtEI3 #bioRxiv
Github

easy access to benchmark datasets

Repository: easy-data
User: czi-hca-comp-tools
Language: None
Stargazers: 40
Subscribers: 16
Forks: 18
Open Issues: 3
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 7129
Unqiue Words: 2036

2.012 Mikeys
#5. Triple layered QSAR Studies on Substituted 1,2,4-Trioxanes as potential antimalarial agents: Superiority of the Quantitative Pharmacophore-Based Alignment, Over Common Substructure based Alignment
Amit K Gupta, Anil K Saxena
The present study reports the utilization of three approaches viz Pharmacophore, CoMFA, CoMSIA and HQSAR studies to identify the essential structural requirements in 3D chemical space for the modulation of the antimalarial activity of substituted 1,2,4 trioxanes. The superiority of Quantitative pharmacophore based alignment (QuantitativePBA) over global minima energy conformer-based alignment (GMCBA) has been reported in CoMFA and CoMSIA studies. The developed models showed good statistical significance in internal validation (q2, group cross-validation and bootstrapping) and performed very well in predicting antimalarial activity of test set compounds. Structural features in terms of their steric, electrostatic, and hydrophobic interactions in 3D space have been found important for the antimalarial activity of substituted 1,2,4-trioxanes. Further, the HQSAR studies based on the same training and test set acted as an additional tool to find the sub-structural fingerprints of substituted 1,2,4 trioxanes for their antimalarial...
more | pdf
Figures
Tweets
biorxivpreprint: Triple layered QSAR Studies on Substituted 1,2,4-Trioxanes as potential antimalarial agents: Superiority of the Quantitative Pharmacophore-Based Alignment, Over Common Substructure based Alignment https://t.co/XebPxqtIFc #bioRxiv
biorxiv_bioinfo: Triple layered QSAR Studies on Substituted 1,2,4-Trioxanes as potential antimalarial agents: Superiority of the Quantitative ... https://t.co/HMmoGOJ33f #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 7123
Unqiue Words: 2095

2.006 Mikeys
#6. MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature
Deena M.A. Gendoo, Michael Zon, Vandana Sandhu, Venkata Manem, Natchar Ratanasirigulchai, Gregory M. Chen, Levi Waldron, Benjamin Haibe-Kains
A wealth of transcriptomic and clinical data on solid tumours are under-utilized due to unharmonized data storage and format. We have developed the MetaGxData package compendium, which includes manually-curated and standardized clinical, pathological, survival, and treatment metadata across breast, ovarian, and pancreatic cancer data. MetaGxData is the largest compendium of curated transcriptomic data for these cancer types to date, spanning 86 datasets and encompassing 15,249 samples. Open access to standardized metadata across cancer types promotes use of their transcriptomic and clinical data in a variety of cross-tumour analyses, including identification of common biomarkers, establishing common patterns of co-expression networks, and assessing the validity of prognostic signatures. Here, we demonstrate that MetaGxData is a flexible framework that facilitates meta-analyses by using it to identify common prognostic genes in ovarian and breast cancer. Furthermore, we use the data compendium to create the first gene signature...
more | pdf
Figures
Tweets
seandavis12: MetaGxData: Clinically Annotated Breast, Ovarian and Pancreatic Cancer Datasets and their Use in Generating a Multi-Cancer Gene Signature https://t.co/cfxZ9QBswc
TheFirstNuomics: https://t.co/Q0RzTDWSxv
Github
None.
Youtube
None.
Other stats
Sample Sizes : [4425, 2695, 1858, 2712, 1928, 1000, 2136]
Authors: 8
Total Words: 9725
Unqiue Words: 2669

2.004 Mikeys
#7. Sparse variable and covariance selection for high-dimensional seemingly unrelated Bayesian regression
Marco Banterle, Leonardo Bottolo, Sylvia Richardson, Mika Ala-Korpela, Marjo-Riitta Jarvelin, Alex Lewin
High-throughput technology for molecular biomarkers is increasingly producing multivariate phenotype data exhibiting strong correlation structures. Existing approaches for combining such data with genetic variants for multivariate Quantitative Trait Loci analysis generally either ignore correlation structure or make other restrictive assumptions about the associations between phenotypes and genetic loci. We present a Bayesian Variable Selection (BVS) model with sparse variable and covariance selection for high-dimensional seemingly unrelated regressions. The model includes a matrix of binary variable selection indicators for multivariate regression, thus allowing different phenotype responses to be associated with different genetic predictors (a seemingly unrelated regressions framework). A general covariance structure is allowed for the residuals relating to the conditional dependencies between phenotype variables. The covariance structure may be dense (unrestricted) or sparse, with a graphical modelling prior. The graphical...
more | pdf
Figures
Tweets
razoralign: BVS: Sparse variable and covariance selection for high-dimensional seemingly unrelated Bayesian regression: https://t.co/8Iy0cAfQT9
Github

MCMC sampler for Sparse SUR model

Repository: Bayesian_SSUR
User: mbant
Language: C++
Stargazers: 0
Subscribers: 0
Forks: 1
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 16626
Unqiue Words: 3771

2.002 Mikeys
#8. Matrix linear models for high-throughput chemical genetic screens
Jane W Liang, Robert J Nichols, Saunak Sen
We develop a flexible and computationally efficient approach for analyzing high throughput chemical genetic screens. In such screens, a library of genetic mutants is phenotyped in a large number of stresses. The goal is to detect interactions between genes and stresses. Typically, this is achieved by grouping the mutants and stresses into categories, and performing modified t-tests for each combination. This approach does not have a natural extension if mutants or stresses have quantitative or non-overlapping annotations (eg. if conditions have doses, or a mutant falls into more than one category simultaneously). We develop a matrix linear model framework that allows us to model relationships between mutants and conditions in a simple, yet flexible multivariate framework. It encodes both categorical and continuous relationships to enhance detection of associations. To handle large datasets, we develop a fast estimation approach that takes advantage of the structure of matrix linear models. We evaluate our method's performance in...
more | pdf
Figures
Tweets
biorxivpreprint: Matrix linear models for high-throughput chemical genetic screens https://t.co/W8knlivg2u #bioRxiv
biorxiv_bioinfo: Matrix linear models for high-throughput chemical genetic screens https://t.co/GhM2nS0OYp #biorxiv_bioinfo
rnomics: Top #tweeted story in #bioinformatics: Matrix linear models for high-throughput chemical genetic screens | bioRxiv https://t.co/Zh7jAl6HdF, see more https://t.co/x4TiUjeQ4E
hdeshmuk: RT @biorxiv_bioinfo: Matrix linear models for high-throughput chemical genetic screens https://t.co/GhM2nS0OYp #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6706
Unqiue Words: 1796

1.999 Mikeys
#9. A case study on the detailed reproducibility of a human cell atlas project
Kui Hua, Xuegong Zhang
Reproducibility is a defining feature of a scientific discovery. Reproducibility can be at different levels for different types of study. The purpose of the Human Cell Atlas (HCA) project is to build maps of molecular signatures of all human cell types and states to serve as references for future discoveries. Constructing such a complex reference atlas must involve the assembly and aggregation of data from multiple labs, probably generated with different technologies. It has much higher requirements on reproducibility than individual research projects. To add another layer of complexity, the bioinformatics procedures involved for single-cell data have high flexibility and diversity. There are many factors in the processing and analysis of single-cell RNA-seq data that can shape the final results in different ways. To study what levels of reproducibility can be reached in current practices, we conducted a detailed reproduction study for a well-documented recent publication on the atlas of human blood dendritic cells as an example...
more | pdf
Figures
Tweets
TheFirstNuomics: https://t.co/eXHHC5u7dm
jwbelmon: RT @biorxivpreprint: A case study on the detailed reproducibility of a human cell atlas project https://t.co/P12CzG0pC3 #bioRxiv
PrecursorCell: RT @biorxivpreprint: A case study on the detailed reproducibility of a human cell atlas project https://t.co/P12CzG0pC3 #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : [269, 174]
Authors: 2
Total Words: 4341
Unqiue Words: 1391

1.998 Mikeys
#10. Predicting trait regulators by identifying co-localization of DNA binding and GWAS variants in regulatory regions
Gerald Quon, Soheil Feizi, Daniel Marbach, Melina Claussnitzer, Manolis Kellis
Genomic regions associated with complex traits and diseases are primarily located in non-coding regions of the genome and have unknown mechanism of action. A critical step to understanding the genetics of complex traits is to fine-map each associated locus; that is, to find the causal variant(s) that underlie genetic associations with a trait. Fine-mapping approaches are currently focused on identifying genomic annotations, such as transcription factor binding sites, which are enriched in direct overlap with candidate causal variants. We introduce CONVERGE, the first computational tool to search for co-localization of GWAS causal variants with transcription factor binding sites in the same regulatory regions, without requiring direct overlap. As a proof of principle, we demonstrate that CONVERGE is able to identify five novel regulators of type 2 diabetes which subsequently validated in knockdown experiments in pancreatic beta cells, while existing fine-mapping methods were unable to find any statistically significant regulators....
more | pdf
Figures
Tweets
biorxivpreprint: Predicting trait regulators by identifying co-localization of DNA binding and GWAS variants in regulatory regions https://t.co/sHaJdQ716v #bioRxiv
biorxiv_bioinfo: Predicting trait regulators by identifying co-localization of DNA binding and GWAS variants in regulatory regions https://t.co/WEHfxh92Ig #biorxiv_bioinfo
kousikbioinfo: CONVERGE - a computational tool to search for co-localization of GWAS causal variants with transcription factor binding sites in the same regulatory regions, without requiring direct overlap from @manoliskellis and colleagues. https://t.co/sVqwmxol2b
seandavis12: Predicting trait regulators by identifying co-localization of DNA binding and GWAS variants in regulatory regions https://t.co/av7YHsD4UD
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 15540
Unqiue Words: 3728

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 56,474 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 56,474 papers.