Top 10 Biorxiv Papers Today in Bioinformatics


2.013 Mikeys
#1. Creating Artificial Human Genomes Using Generative Models
Burak Yelmen, Aurelien Decelle, Linda Ongaro, Davide Marnetto, Corentin Tallec, Francesco Montinaro, Cyril Furtlehner, Luca Pagani, Flora Jay
Generative models have shown breakthroughs in a wide spectrum of domains due to recent advancements in machine learning algorithms and increased computational power. Despite these impressive achievements, the ability of generative models to create realistic synthetic data is still under-exploited in genetics and absent from population genetics. Yet a known limitation of this field is the reduced access to many genetic databases due to concerns about violations of individual privacy, although they would provide a rich resource for data mining and integration towards advancing genetic studies. Here we demonstrate that we can train deep generative adversarial networks (GANs) and restricted Boltzmann machines (RBMs) to learn the high dimensional distributions of real genomic datasets and create artificial genomes (AGs). Additionally, we ensure none to little privacy loss while generating high quality AGs. To illustrate the promising outcomes of our method, we show that augmenting reference panels with AGs improves imputation quality...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
biorxiv_bioinfo: Creating Artificial Human Genomes Using Generative Models https://t.co/BZIHlJJXZq #biorxiv_bioinfo
razoralign: Creating Artificial Human Genomes Using Generative Models https://t.co/xUaGdliRHD
levin_bertrandt: RT @biorxiv_bioinfo: Creating Artificial Human Genomes Using Generative Models https://t.co/BZIHlJJXZq #biorxiv_bioinfo
anshulkundaje: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
Clive_G_Brown: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
Raveancic: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
enerphyschem: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
Ismail_Moghul: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
AurelienDecelle: RT @biorxivpreprint: Creating Artificial Human Genomes Using Generative Models https://t.co/H8hOCuG8OP #bioRxiv
hdeshmuk: RT @biorxiv_bioinfo: Creating Artificial Human Genomes Using Generative Models https://t.co/BZIHlJJXZq #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 9
Total Words: 0
Unqiue Words: 0

2.007 Mikeys
#2. miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests
Gulden Olgun, Oznur Tastan
Although miRNAs can cause widespread changes on expression programs, single miRNAs typically induce mild repression on their targets. Cooperativity is reported as one strategy to overcome this constraint. Expanding the catalog of synergistic miRNAs is critical for understanding the regulation of various gene expression programs. In this study, we develop miRCoop to identify synergistic miRNA pairs which have weak or no repression on the target mRNA, but when bound together induce strong repression of their target's expression. To discover triplets of RNAs whose expression levels follow these statistical interaction patterns, miRCoop uses kernel-based interaction tests together with miRNA and mRNA target information. We apply our approach to kidney tumor and identify 66 putative triplets. For 64 of these triplets, there is at least one common transcription factor that potentially regulates all participating RNAs of the triplet supporting a functional association among them. Furthermore, we find triplets are enriched for biological...
more | pdf
Figures
Tweets
biorxivpreprint: miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests https://t.co/CVqg7IIsV4 #bioRxiv
biorxiv_bioinfo: miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests https://t.co/v6gth0n8Ei #biorxiv_bioinfo
TastanOznur: New preprint (w @guldenolgun ): "miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests." https://t.co/ypSOSOcoZH
levin_bertrandt: RT @biorxiv_bioinfo: miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests https://t.co/v6gth0n8Ei #biorxiv_bioinfo
thparietallobe: RT @biorxivpreprint: miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests https://t.co/CVqg7IIsV4 #bioRxiv
enerphyschem: RT @biorxivpreprint: miRCoop: Identifying Cooperating miRNAs via Kernel Based Interaction Tests https://t.co/CVqg7IIsV4 #bioRxiv
Github

Identifying Cooperating miRNAs via Kernel Based Interaction Tests

Repository: miRCoop
User: guldenolgun
Language: MATLAB
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8915
Unqiue Words: 2853

2.002 Mikeys
#3. MetroNome - a visual data exploration platform for integrating human genotypic and phenotypic data across diseases
Christian Stolte, Kevin Shi, Nina Lapchyk, Nathaniel Novod, Avinash Abhyankar, Lyle W Ostrow, Hemali Phatnani, Toby Bloom
MetroNome is a web-based visual data exploration platform which integrates de-identified genomic, transcriptomic, and phenotypic data sets. Users can define and compare cohorts constructed from multimodal data and share the data and analyses with outside tools. MetroNome's interactive visualization and analysis tools allow researchers to quickly form and explore novel hypotheses. The deidentified data is linked back to the source biosample inventories in multiple biobanks, enabling researchers to further investigate new ideas using the most relevant samples.
more | pdf
Figures
Tweets
Github

Visual Data Exploration for Genomic Medicine

Repository: metronome
User: nygenome
Language: None
Stargazers: 0
Subscribers: 10
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 8
Total Words: 4600
Unqiue Words: 2020

1.998 Mikeys
#4. Finding differentially expressed sRNA-Seq regions with srnadiff
Matthias Zytnicki, Ignacio González
Small RNAs (sRNAs) encompass a great variety of different molecules of different kinds, such as micro RNAs, small interfering RNAs, Piwi-associated RNA, among other. These sRNA have a wide range of activities, which include gene regulation, protection against virus, transposable element silencing, and have been identified as a key actor to study and understand the development of the cell. Small RNA sequencing is thus routinely used to assess the expression of the diversity of sRNAs, usually in the context of differentially expression, where two conditions are compared. Many tools have been presented to detect differentially expressed micro RNAs, because they are well documented, and the associated genes are well defined. However, tools are lacking to detect other types of sRNAs, which are less studied, and have an imprecise "gene" structure. We present here a new method, called srnadiff, to find all kinds of differentially expressed sRNAs. To the extent of our knowledge, srnadiff is the first tool that detects differentially...
more | pdf
Figures
Tweets
biorxivpreprint: Finding differentially expressed sRNA-Seq regions with srnadiff https://t.co/asVec1MMww #bioRxiv
biorxiv_bioinfo: Finding differentially expressed sRNA-Seq regions with srnadiff https://t.co/OEmKP8hL0L #biorxiv_bioinfo
AGuleren: RT @biorxivpreprint: Finding differentially expressed sRNA-Seq regions with srnadiff https://t.co/asVec1MMww #bioRxiv
diaparim: RT @biorxivpreprint: Finding differentially expressed sRNA-Seq regions with srnadiff https://t.co/asVec1MMww #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 6911
Unqiue Words: 2266

1.998 Mikeys
#5. ChIP-Hub: an Integrative Platform for Exploring Plant Regulome
Dijun Chen, Liang-Yu Fu, Peijing Zhang, Ming Chen, Kerstin Kaufmann
Plant genomes encode a complex and evolutionary diverse regulatory grammar that forms the basis for most life on earth. A wealth of regulome and epigenome data have been generated in various plant species, but no common, standardized resource is available so far for biologists. Here we present ChIP-Hub, an integrative web-based platform in the ENCODE standards that bundles publicly available datasets reanalyzed from >40 plant species, allowing visualization and meta-analysis.
more | pdf
Figures
None.
Tweets
biorxivpreprint: ChIP-Hub: an Integrative Platform for Exploring Plant Regulome https://t.co/djDyUZ3Gc4 #bioRxiv
biorxiv_bioinfo: ChIP-Hub: an Integrative Platform for Exploring Plant Regulome https://t.co/gm6swA7a8L #biorxiv_bioinfo
hdeshmuk: RT @biorxiv_bioinfo: ChIP-Hub: an Integrative Platform for Exploring Plant Regulome https://t.co/gm6swA7a8L #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : [3078, 32, 1647, 3087]
Authors: 5
Total Words: 6779
Unqiue Words: 2475

1.997 Mikeys
#6. Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data
Steffen Albrecht, Miguel A Andrade-Navarro, Jean-Fred Fontaine
Next Generation Sequencing is a powerful technology highly relevant in biomedical research and pharmaceutical industry. Applied in combination with molecular assays it provides detailed insights in genomic properties such as chromatin accessibility or protein-DNA interactions. However, the biological relevance of results from these assays is extremely sensitive to the quality of the sequencing data. So far, quality control tools require extensive computational resources and manual inspection. This is critical considering the increasing amount of sequencing data due to decreasing costs. In this study, we investigated the possibility to automatically classify the quality of a large set of raw sequencing data in fastq format by using state-of-the-art machine learning algorithms and a comprehensive grid search to tune the parameters. The results showed high classification accuracy in discriminating between low and high quality files. Gradient Boosting machines were performing the best in most of the tested scenarios. Furthermore, some...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data https://t.co/SpABLlPTIL #bioRxiv
biorxiv_bioinfo: Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data https://t.co/WxGgI8iV3e #biorxiv_bioinfo
SantchiWeb: RT @biorxiv_bioinfo: Machine Learning in Quality Assessment of Early Stage Next-Generation Sequencing Data https://t.co/WxGgI8iV3e #biorxi…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#7. A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries
Cedoljub Bundalovic-Torma, Gregory B Whitfield, Lindsey S Marmont, P. Lynne Howell, John Parkinson
In bacterial functionally related genes comprising metabolic pathways and protein complexes are frequently encoded in operons and are widely conserved across phylogenetically diverse species. The evolution of these operon-encoded processes is affected by diverse mechanisms such gene duplication, loss, rearrangement, and horizontal transfer. These mechanisms can result in functional diversification of gene-families, increasing the potential evolution of novel biological pathways, and serves to adapt pre-existing pathways to the requirements of particular environments. Despite the fundamental importance that these mechanisms play in bacterial environmental adaptation, a systematic approach for studying the evolution of operon organization is lacking. Herein, we present a novel method to study the evolution of operons based on phylogenetic clustering of operon-encoded protein families and genomic-proximity network visualizations of operon architectures. We applied this approach to study the evolution of the synthase dependent...
more | pdf
Figures
Tweets
biorxivpreprint: A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries https://t.co/hQRfhNFoUr #bioRxiv
biorxiv_bioinfo: A systematic pipeline for classifying bacterial operons reveals the evolutionary landscape of biofilm machineries https://t.co/wzDqCkDTii #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 17500
Unqiue Words: 5055

1.997 Mikeys
#8. Copy number motifs expose genome instability type and predict driver events and disease outcome in breast cancer
Arne V Pladsen, Gro Nilsen, Oscar M Rueda, Miriam R Aure, Ørnulf Borgan, Knut Liestøl, Valeria Vitelli, Arnoldo Frigessi, Anita Langerød, OSBREAC, Anthony Mathelier, Olav Engebråten, David Wedge, Peter Van Loo, Carlos Caldas, Anne-Lise Børresen-Dale, Hege G Russnes, Ole Christian Lingjærde
Tumor evolution is dependent on and constrained by the genotypes emerging from genome instability. We hypothesized that non-site-specific copy number motifs would correlate with underlying replication defects and also with tumor and patient fate. Six feature detectors were defined to characterize and score the local spatial behaviour of a copy number profile. By accumulating scores across genomic regions, a low-dimensional representation of the tumor genome was obtained. The proposed Copy Aberration Regional Mapping Analysis (CARMA) algorithm was applied to 2384 breast tumors from three breast cancer cohorts, revealing distinct copy number motifs in established molecular subtypes. A prognostic index combining the features predicted breast cancer specific survival better than both the genomic instability index (GII) and all commonly used clinical stratifications. CARMA offers effective comparison of tumor subgroups and extracts biologically and clinically relevant features from allele-specific copy number profiles.
more | pdf
Figures
None.
Tweets
biorxivpreprint: Copy number motifs expose genome instability type and predict driver events and disease outcome in breast cancer https://t.co/9KpY6AH9wH #bioRxiv
biorxiv_bioinfo: Copy number motifs expose genome instability type and predict driver events and disease outcome in breast cancer https://t.co/3DDBr1NOXW #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 18
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#9. De Novo Protein Design for Novel Folds using Guided Conditional Wasserstein Generative Adversarial Networks (gcWGAN)
Mostafa Karimi, Shaowen Zhu, Yue Cao, Yang Shen
Motivation: Facing data quickly accumulating on protein sequence and structure, this study is addressing the following question: to what extent could current data alone reveal deep insights into the sequence-structure relationship, such that new sequences can be designed accordingly for novel structure folds? Results: We have developed novel deep generative models, constructed low-dimensional and generalizable representation of fold space, exploited sequence data with and without paired structures, and developed ultra-fast fold predictor as an oracle providing feedback. The resulting semi-supervised gcWGAN is assessed with the oracle over 100 novel folds not in the training set and found to generate more yields and cover 3.6 times more target folds compared to a competing data-driven method (cVAE). Assessed with structure predictor over representative novel folds (including one not even part of basis folds), gcWGAN designs are found to have comparable or better fold accuracy yet much more sequence diversity and novelty than cVAE....
more | pdf
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6657
Unqiue Words: 2191

1.997 Mikeys
#10. Causal inference for the effect of environmental chemicals on chronic kidney disease
Jing Zhao, Paige Hinton, Qin Ma
There is evidence from a limited number of statistical and animal studies that suggest that perfluoroalkyl acids (PFAs) are linked to a decline in kidney function. Thus, PFA exposure may be a modifiable risk factor for chronic kidney disease (CKD). As PFA is pervasive throughout our environment, determining its health effects is an important public health concern. We examined cross-sectional data from the 2009-2010 cycle of NHANES using generalized propensity score (GPS) analysis and univariate and multivariate ordinary least squares (OLS) regression to determine the link between urinary PFA concentration and estimated glomerular filtration rate (eGFR). GPS estimation methods used were Hirano-Imbens, additive spline, and a generalized additive model. Each of the statistical models used associated an increase in PFA concentration with a decline in eGFR, though the eGFR fit using the multivariate regression model were consistently higher than from the other four models. We conclude that PFA is a modifiable risk factor for CKD and...
more | pdf
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 189,566 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 189,566 papers.