Top 10 Biorxiv Papers Today in Genomics


2.181 Mikeys
#1. H3K4me3 is neither instructive for, nor informed by, transcription.
Struan C Murray, Philipp Lorenz, Francoise Howe, Meredith Wouters, Thomas Brown, Shidong Xi, Harry Fischl, Walaa Khushaim, Joseph Regish Rayappu, Andrew Angel, Jane Mellor
H3K4me3 is a near-universal histone modification found predominantly at the 5' region of genes, with a well-documented association with gene activity. H3K4me3 has been ascribed roles as both an instructor of gene expression and also a downstream consequence of expression, yet neither has been convincingly proven on a genome-wide scale. Here we test these relationships using a combination of bioinformatics, modelling and experimental data from budding yeast in which the levels of H3K4me3 have been massively ablated. We find that loss of H3K4me3 has no effect on the levels of nascent transcription or transcript in the population. Moreover, we observe no change in the rates of transcription initiation, elongation, mRNA export or turnover, or in protein levels, or cell-to-cell variation of mRNA. Loss of H3K4me3 also has no effect on the large changes in gene expression patterns that follow galactose induction. Conversely, loss of RNA polymerase from the nucleus has no effect on the pattern of H3K4me3 deposition and little effect on...
more | pdf
Figures
None.
Tweets
marcotrizzino: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
BioGibberish: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
EvolPaper: RT @biorxiv_genomic: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/7nGmmmd48G #biorxiv_genomic
PrecursorCell: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
jason_tanny: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
leehenry1971: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
LeungCal: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
tito_tasks: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
maahoek: RT @biorxivpreprint: H3K4me3 is neither instructive for, nor informed by, transcription. https://t.co/F0v5RwyphZ #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 11
Total Words: 0
Unqiue Words: 0

2.163 Mikeys
#2. What can we learn from over 100,000 Escherichia coli genomes?
Kaleb Z. Zion Abram, Zulema Udaondo, Carissa Bleker, Visanu Wanchai, Trudy M Wassenaar, David W Ussery
The explosion of microbial genome sequences in public databases allows for large-scale population studies of model organisms, such as Escherichia coli . We have examined more than one hundred-thousand E. coli and Shigella genomes . After removing outliers, genomes were classified into two broad clusters based on a semi-automated Mash analysis, which distinguished 14 distinct phylotypes, graphically illustrated by Cytoscape. From a set of more than ten-thousand good quality E. coli and Shigella genomes from GenBank, we find roughly 2,700 gene families in the E. coli species core, and more than 135,000 gene families in the E. coli pan-genome. Based on a set of 2,613 single-copy core proteins taken from one representative genome per phylotype, we constructed a robust phylogenetic tree. This is the largest E. coli genome dataset analyzed to date, and provides valuable insight into the population structure of the species.
more | pdf
Figures
None.
Tweets
AstrobioMike: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
IFB_Bioinfo: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
prashbio: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
pierre_marijon: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
kofi_little_boy: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
ybazetag: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
HeleneChiapello: RT @biorxiv_genomic: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/KFUajcwrhm #biorxiv_genomic
vallenet: RT @biorxivpreprint: What can we learn from over 100,000 Escherichia coli genomes? https://t.co/zfxBEB7jK9 #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 0
Unqiue Words: 0

2.157 Mikeys
#3. Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data
Jacob Schreiber, Jeffrey Bilmes, William Stafford Noble
Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types ("biosamples") and a list of possible high throughput sequencing assays, we ask "Which experiments should ENCODE perform next?" We demonstrate how to represent this task as an optimization problem, where the goal is to maximize the information gained in each successive experiment. Compared with previous work that has addressed a similar problem, our approach has the advantage that it can use imputed data to...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data https://t.co/eYhydjuPGz #bioRxiv
biorxiv_genomic: Prioritizing transcriptomic and epigenomic experiments by using an optimization strategy that leverages imputed data https://t.co/os7dae1r3U #biorxiv_genomic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.064 Mikeys
#4. Accuracy of de novo assembly of DNA sequences from double-digest libraries varies substantially among software
Melanie LaCava, Ellen Aikens, Libby Megna, Gregg Randolph, Charley Hubbard, Alex Buerkle
Advances in DNA sequencing have made it feasible to gather genomic data for non-model organisms and large sets of individuals, often using methods for sequencing subsets of the genome. Several of these methods sequence DNA associated with endonuclease restriction sites (various RAD and GBS methods). For use in taxa without a reference genome, these methods rely on de novo assembly of fragments in the sequencing library. Many of the software options available for this application were originally developed for other assembly types and we do not know their accuracy for reduced representation libraries. To address this important knowledge gap, we simulated data from the Arabidopsis thaliana and Homo sapiens genomes and compared de novo assemblies by six software programs that are commonly used or promising for this purpose (ABySS, CD-HIT, Stacks, Stacks2, Velvet and VSEARCH). We simulated different mutation rates and types of mutations, and then applied the six assemblers to the simulated datasets, varying assembly parameters. We...
more | pdf
Figures
None.
Tweets
disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
disequilibber: @ethanblinck @JonPuritz Today I can add some data/analyses to go with this recommendation. https://t.co/N1xdOLmYEz
mtanichthys: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
jilla_hamilton: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
m_matschiner: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
IntegratEcology: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
IKimirei: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
ecojydrology: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
Hubbard_Charley: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
liz_mandeville: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
SerTusso: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
PhishBiologist: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
LibbyMegna: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
sch_astrid: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
jessi_rick: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
Angelik_Cuevas: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
ErickGagne1: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
EcologistJosh: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
Tavoibrahim: RT @disequilibber: Interested in how assemblers compare with ddRAD or GBS data? We have some information about that: https://t.co/N1xdOLmYEz
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 0
Unqiue Words: 0

2.055 Mikeys
#5. Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014-2017)
Léonid M. Irenge, Jerôme Ambroise, Prudence N Mitangala, Bertrand Bearzatto, Raphaël K.S. Kabangwa, Jean-François Durant, Jean-Luc Gala
Background: Over the past recent years, Vibrio cholerae has been associated with outbreaks in Sub Saharan Africa, notably in Democratic Republic of the Congo (DRC). This study aimed to determine the genetic relatedness of isolates responsible for cholera outbreaks in eastern DRC between 2014 and 2017, and their potential spread to bordering countries. Methods/Principal findings: Phenotypic analysis and whole genome sequencing (WGS) were carried out on 78 clinical isolates of V. cholerae associated with cholera in eastern provinces of DRC between 2014 and 2017. SNP-based phylogenomic data show that most isolates (73/78) were V. cholerae O1 biotype El Tor with CTX-3 type prophage. They fell within the third transmission wave of the current seventh pandemic El Tor (7PET) lineage and were contained in the introduction event (T)10 in East Africa. These isolates clustered in two sub-clades corresponding to Multiple Locus Sequence Types (MLST) profiles ST69 and the newly assigned ST515, the latter displaying a higher genetic diversity....
more | pdf
Figures
None.
Tweets
biorxivpreprint: Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014-2017) https://t.co/d7t0BzvvXP #bioRxiv
biorxiv_genomic: Genomic analysis of pathogenic isolates of Vibrio cholerae from eastern Democratic Republic of the Congo (2014-2017) https://t.co/iRpvPdXGhy #biorxiv_genomic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

2.042 Mikeys
#6. An integrative ENCODE resource for cancer genomics
Jing Zhang, Donghoon Lee, Vineet Dhiman, Peng Jiang, Jie Xu, Patrick McGillivray, Hongbo Yang, Jason Liu, William Meyerson, Declan Clarke, Mengting Gu, Shantao Li, Shaoke Lou, Jinrui Xu, Lucas Lochovsky, Matthew Ung, Lijia Ma, Shan Yu, Qin Cao, Arif Harmanci, Koon-Kiu Yan, Anurag Sethi, Gamze Gursoy, Michael Rutenberg Schoenberg, Joel Rozowsky, Jonathan Warrell, Prashant Emani, Yucheng T Yang, Timur Galeev, Xiangmeng Kong, Shuang Liu, Xiaotong Li, Jayanth Krishnan, Yanlin Feng, Juan Carlos Rivera-Mulia, Jessica Adrian, James R Broach, Michael Bolt, Jennifer Moran, Dominic Fitzgerald, Vishnu Dileep, Tingting Liu, Shenglin Mei, Takayo Sasaki, Claudia Trevilla-Garcia, Su Wang, Yanli Wang, Chongzhi Zang, Daifeng Wang, Robert Klein, Michael Snyder, David M Gilbert, Kevin Yip, Chao Cheng, Feng Yue, Xiaole Shirley Liu, Kevin White, Mark B Gerstein
ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF....
more | pdf
Figures
Tweets
biorxivpreprint: An integrative ENCODE resource for cancer genomics https://t.co/WKFzSRg2o3 #bioRxiv
biorxiv_genomic: An integrative ENCODE resource for cancer genomics https://t.co/0n2bXRuScF #biorxiv_genomic
tangming2005: RT @biorxivpreprint: An integrative ENCODE resource for cancer genomics https://t.co/WKFzSRg2o3 #bioRxiv
alexcellfree: RT @biorxiv_genomic: An integrative ENCODE resource for cancer genomics https://t.co/0n2bXRuScF #biorxiv_genomic
prashbio: RT @biorxiv_genomic: An integrative ENCODE resource for cancer genomics https://t.co/0n2bXRuScF #biorxiv_genomic
sengupso: RT @biorxivpreprint: An integrative ENCODE resource for cancer genomics https://t.co/WKFzSRg2o3 #bioRxiv
ybazetag: RT @biorxiv_genomic: An integrative ENCODE resource for cancer genomics https://t.co/0n2bXRuScF #biorxiv_genomic
HTLVnet: RT @biorxivpreprint: An integrative ENCODE resource for cancer genomics https://t.co/WKFzSRg2o3 #bioRxiv
ansuman90: RT @biorxivpreprint: An integrative ENCODE resource for cancer genomics https://t.co/WKFzSRg2o3 #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : [1500, 492, 62, 365]
Authors: 58
Total Words: 9079
Unqiue Words: 3069

2.032 Mikeys
#7. Comparison of adopted and non-adopted individuals reveals gene-environment interplay for education in the UK Biobank
Rosa Cheesman, Avina Hunjan, Jonathan Coleman, Yasmin Ahmadzadeh, Robert Plomin, Tom A McAdams, Thalia C Eley, Gerome Breen
Individual-level polygenic scores can now explain ~10% of the variation in number of years of completed education. However, associations between polygenic scores and education capture not only genetic propensity but information about the environment that individuals are exposed to. This is because individuals passively inherit effects of parental genotypes, since their parents typically also provide the rearing environment. In other words, the strong correlation between offspring and parent genotypes results in an association between the offspring genotypes and the rearing environment. This is termed passive gene-environment correlation. We present an approach to test for the extent of passive gene-environment correlation for education without requiring intergenerational data. Specifically, we use information from 6311 individuals in the UK Biobank who were adopted in childhood to compare genetic influence on education between adoptees and non-adopted individuals. Adoptees' rearing environments are less correlated with their...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Comparison of adopted and non-adopted individuals reveals gene-environment interplay for education in the UK Biobank https://t.co/bjsA1G7Qb5 #bioRxiv
biorxiv_genomic: Comparison of adopted and non-adopted individuals reveals gene-environment interplay for education in the UK Biobank https://t.co/JaEEyvml7B #biorxiv_genomic
PKoellinger: And here is the link to @RosaCheesman's preprint: https://t.co/Z6PNKhBHFL
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 8
Total Words: 0
Unqiue Words: 0

2.019 Mikeys
#8. Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge
Lipika R. Pal, Kunal Kundu, Yizhou Yin, John Moult
Precise identification of causative variants from whole-genome sequencing data, including both coding and non-coding variants, is challenging. The CAGI5 SickKids clinical genome challenge provided an opportunity to assess our ability to extract such information. Participants in the challenge were required to match each of 24 whole-genome sequences to the correct phenotypic profile and to identify the disease class of each genome. These are all rare disease cases that have resisted genetic diagnosis in a state-of-the-art pipeline. The patients have a range of eye, neurological, and connective-tissue disorders. We used a gene-centric approach to address this problem, assigning each gene a multi-phenotype-matching score. Mutations in the top scoring genes for each phenotype profile were ranked on a six-point scale of pathogenicity probability, resulting in an approximately equal number of top ranked coding and non-coding candidate variants overall. We were able to assign the correct disease class for 12 cases and the correct genome...
more | pdf
Figures
Tweets
biorxivpreprint: Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge in the CAGI SickKids5 clinical genomes challenge https://t.co/JzMKITx2IQ #bioRxiv
biorxiv_genomic: Matching whole genomes to rare genetic disorders: Identification of potential causative variants using phenotype-weighted knowledge ... https://t.co/kS9X0a33wG #biorxiv_genomic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 13162
Unqiue Words: 3028

2.018 Mikeys
#9. DNA Methylation Biomarkers Of Myocardial Infarction And Cardiovascular Disease
Alba Fernandez-Sanles, Sergi Sayols-Baixeras, Isaac Subirana, Mariano Senti, Silvia Perez-Fernandez, Manuel Castro de Moura, Manel Esteller, Jaume Marrugat, Roberto Elosua
Background: DNA methylation is associated with atherosclerosis and cardiovascular risk factors. However, little evidence regarding its association with cardiovascular diseases in large studies is currently available. We aimed to assess the association between DNA methylation and cardiovascular events, and to determine both the predictive capacity of the identified loci and the causality of those associations. Methods: We defined two strategies: epigenome-wide (EWAS) and candidate-gene association studies. In both strategies, we designed one approach with prevalent cases of coronary heart disease (CHD) and another with incident cases of CHD and cardiovascular disease. We used data from three independent cohorts: the REgistre GIroni del COR (REGICOR) study, the Framingham Offspring Study and the Women's Health Initiative. We also assessed the association between the identified CpGs and cardiovascular risk factors in the three populations. Then, we developed methylation risk scores to evaluate whether their inclusion in the...
more | pdf
Figures
None.
Tweets
biorxivpreprint: DNA Methylation Biomarkers Of Myocardial Infarction And Cardiovascular Disease https://t.co/1GX2NCpfui #bioRxiv
biorxiv_genomic: DNA Methylation Biomarkers Of Myocardial Infarction And Cardiovascular Disease https://t.co/ZW8gN6Gg70 #biorxiv_genomic
sbotlite: RT @biorxivpreprint: DNA Methylation Biomarkers Of Myocardial Infarction And Cardiovascular Disease https://t.co/1GX2NCpfui #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 9
Total Words: 0
Unqiue Words: 0

2.014 Mikeys
#10. Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data
Ellie E. Armstrong, Ryan W. Taylor, Danny E Miller, Christopher Kaelin, Gregory Barsh, Elizabeth A Hadly, Dmitri Petrov
The lion (Panthera leo) is one of the most popular and iconic feline species on the planet, yet in spite of its popularity, the last century has seen massive declines for lion populations worldwide. Genomic resources for endangered species represent an important way forward for the field of conservation, enabling high-resolution studies of demography, disease, and population dynamics. Here, we present a chromosome-level assembly for the captive African lion from the Exotic Feline Rescue Center as a resource for current and subsequent genetic work of the sole social species of the Panthera clade. Our assembly is composed of 10x Genomics Chromium data, Dovetail Hi-C, and Oxford Nanopore long-read data. Synteny is highly conserved between the lion, other Panthera genomes, and the domestic cat. We find variability in the length and levels of homozygosity across the genomes of the lion sequenced here and other previous published resequence data, indicating contrasting histories of recent and ancient small population sizes and/or...
more | pdf
Figures
None.
Tweets
DTGenomics: "It is our hope that this genome will enable a new generation of high quality genomic studies of the lion, in addition to comparative studies across Felidae." #LionKingGenomeAssembly #DovetailHiC #AGenomeFitForaKing @LizHadly @PetrovADmitri @_ellie_cat https://t.co/sBsyDK7nat https://t.co/H5fPgv1ELW
DTGenomics: Ah Zabenyaaaa! Presenting the #LionKingGenomeAssembly TODAY! Available at a biorxiv near you #LongLiveTheKing #AGenomeFitForaKing #ProtectThePride #DovetailHiC @LizHadly @PetrovADmitri @_ellie_cat @10xGenomics @nanopore https://t.co/sBsyDK7nat https://t.co/KMps4y44Hs
razoralign: Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data https://t.co/mhFFANIz65 https://t.co/bkiO0g3gTy
BioRxivCurator: Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long read data https://t.co/9kOMse0iJP
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 160,428 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 160,428 papers.