Top 10 Biorxiv Papers Today in Genetics


2.193 Mikeys
#1. Efficient toolkit implementing best practices for principal component analysis of population genetic data
Florian Privé, Keurcien Luu, Michael G.B. Blum, John J. McGrath, Bjarni J. Vilhjálmsson
Principal Component Analysis (PCA) of genetic data is routinely used to infer ancestry and control for population structure in various genetic analyses. However, conducting PCA analyses can be complicated and has several potential pitfalls. These pitfalls include (1) capturing Linkage Disequilibrium (LD) structure instead of population structure, (2) projected PCs that suffer from shrinkage bias when projecting PCA from a reference dataset to another independent dataset, (3) detecting sample outliers, and (4) uneven population sizes. In this work, we explore these potential issues when using PCA, and present efficient solutions to these. Following applications to the UK Biobank and the 1000 Genomes project datasets, we make recommendations for best practices and provide efficient and user-friendly implementations of the proposed solutions in R packages bigsnpr and bigutilsr. For example, we show that PC19 to PC40 in the UK Biobank capture LD structure. Using our automatic algorithm for removing long-range LD regions, we recover 16...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Efficient toolkit implementing best practices for principal component analysis of population genetic data https://t.co/tmPIkKVmde #bioRxiv
biorxiv_genetic: Efficient toolkit implementing best practices for principal component analysis of population genetic data https://t.co/SMtRncQCPu #biorxiv_genetic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

2.193 Mikeys
#2. Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass- and fat mass-indexes
Marilia O Scliar, Hanaisa P Sant Anna, Meddly L Santolalla, Thiago P Leal, Nathalia M Araujo, Isabela Alvim, Victor Borda, Wagner CS Magalhães, Mateus H Gouveia, Ricardo Lyra, Moara Machado, Lucas Michelin, Maíra R Rodrigues, Gilderlanio S Araújo, Fernanda SG Kehdy, Camila Zolini, Sérgio V Peixoto, Marcelo Luizon, Francisco P Lobo, Michel S Naslavsky, Guilherme L Yamamoto, Yeda AO Duarte, Matthew EB Hansen, Shane A Norris, Robert H Gilman, Heinner Guio, Ann Hsing, Sam M Mbulaiteye, James Mensah, Julie Dutil, Meredith Yeager, Edward Yeboah, Sarah A Tishkoff, Ananyo Choudhury, Michele Ramsay, Maria Rita Passos-Bueno, Mayana Zatz, Timothy D. O'Connor, Alexandre C Pereira, Mauricio L Barreto, Maria Fernanda Lima-Costa, Bernardo L Horta, Eduardo Tarazona-Santos
Admixed populations are a resource to study the global genetic architecture of complex phenotypes, which is critical, considering that non-European populations are severely under-represented in genomic studies. Leveraging admixture in Brazilians, whose chromosomes are mosaics of fragments of Native American, European and African origins, we used genome-wide data to perform admixture mapping/fine-mapping of Body Mass Index (BMI) in three population-based cohorts from Northeast (Salvador), Southeast (Bambui) and South (Pelotas) of the country. We found significant associations with African-associated alleles in children from Salvador (PALD1 and ZMIZ1 genes), and in young adults from Pelotas (NOD2 and MTUS2 genes). More importantly, in Pelotas, rs114066381, mapped in a potential regulatory region, is significantly associated only in females (p= 2.76 e-06). This variant is very rare in Europeans but with frequencies of ~3% in West Africa, and has a strong female-specific effect (95%CI: 2.32-5.65 kg/m2 per each A allele). We replicated...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong female-specific effect on body mass- and fat mass-indexes https://t.co/DdyihGPadH #bioRxiv
biorxiv_genetic: Admixture/fine-mapping in Brazilians reveals a West African associated potential regulatory variant (rs114066381) with a strong ... https://t.co/9gH6nVqo3Z #biorxiv_genetic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 43
Total Words: 0
Unqiue Words: 0

2.019 Mikeys
#3. Novel transformer networks for improved sequence labeling in genomics
Jim Clauwaert, Willem Waegeman
In genomics, a wide range of machine learning methods is used to annotate biological sequences w.r.t. interesting positions such as transcription start sites, translation initiation sites, methylation sites, splice sites, promotor start sites, etc. In recent years, this area has been dominated by convolutional neural networks, which typically outperform older methods as a result of automated scanning for influential sequence motifs. As an alternative, we introduce in this paper transformer architectures for whole-genome sequence labeling tasks. We show that those architectures, which have been recently introduced for natural language processing, allow for a fast processing of long DNA sequences. We optimize existing networks and define a new way to calculate attention, resulting in state-of-the-art performances. To demonstrate this, we evaluate our transformer model architecture on several sequence labeling tasks, and find it to outperform specialized models for the annotation of transcription start sites, translation initiation...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Novel transformer networks for improved sequence labeling in genomics https://t.co/MZ7Xcq8HXz #bioRxiv
biorxiv_genetic: Novel transformer networks for improved sequence labeling in genomics https://t.co/OaBVeTwWWk #biorxiv_genetic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.002 Mikeys
#4. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses
Chris Wallace
Horizontal integration of summary statistics from different GWAS traits can be used to evaluate evidence for their shared genetic causality. One popular method to do this is a Bayesian method, coloc, which is attractive in requiring only GWAS summary statistics and no linkage disequilibrium estimates and is now being used routinely to perform thousands of comparisons between traits. Here we show that while most users do not adjust default software values, misspecification of prior parameters can substantially alter posterior inference. We suggest data driven methods to derive sensible prior values, and demonstrate how sensitivity analysis can be used to assess robustness of posterior inference. The flexibility of coloc comes at the expense of an unrealistic assumption of a single causal variant per trait. This assumption can be relaxed by stepwise conditioning, but this requires external software and an LD matrix aligned to study alleles. We have now implemented conditioning within coloc, and propose a new alternative method,...
more | pdf
Figures
None.
Tweets
casey6r0wn: RT @biorxiv_genetic: Eliciting priors and relaxing the single causal variant assumption in colocalisationanalyses https://t.co/wherVc7jrG…
sbguarch: RT @biorxiv_genetic: Eliciting priors and relaxing the single causal variant assumption in colocalisationanalyses https://t.co/wherVc7jrG…
egamazon: RT @biorxiv_genetic: Eliciting priors and relaxing the single causal variant assumption in colocalisationanalyses https://t.co/wherVc7jrG…
williamreay96: RT @biorxiv_genetic: Eliciting priors and relaxing the single causal variant assumption in colocalisationanalyses https://t.co/wherVc7jrG…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 9812
Unqiue Words: 2537

2.0 Mikeys
#5. Delineation of the SUMO-Modified Proteome Reveals Regulatory Functions Throughout Meiosis
Nikhil R Bhagwat, Shannon Owens, Masaru Ito, Jay Boinapalli, Philip Poa, Alexander Ditzel, Srujan Kopparapu, Meghan Mahalawat, Owen R Davies, Sean R Collins, Jeffrey Johnson, Nevan J Krogan, Neil Hunter
Protein modification by SUMO helps orchestrate the elaborate events of meiosis to faithfully produce haploid gametes. To date, only a handful of meiotic SUMO targets have been identified. Here we delineate a multidimensional SUMO-modified meiotic proteome in budding yeast, identifying 2747 conjugation sites in 775 targets, and defining their relative levels and dynamics. Modified sites cluster in disordered regions and only a minority match consensus motifs. Target identities and modification dynamics imply that SUMOylation regulates all levels of chromosome organization and each step of homologous recombination. Execution-point analysis confirms these inferences, revealing functions for SUMO in S-phase, the initiation of recombination, chromosome synapsis and crossing over. K15-linked SUMO chains become prominent as chromosomes synapse and recombine, consistent with roles in these processes. SUMO also modifies ubiquitin, forming hybrid oligomers with potential to modulate ubiquitin signaling. We conclude that SUMO plays diverse...
more | pdf
Figures
Tweets
meiosis_papers: Delineation of the SUMO-Modified Proteome Reveals Regulatory Functions Throughout Meiosis | Hunter N https://t.co/h8C7VheTmW
PromPreprint: Delineation of the SUMO-Modified Proteome Reveals Regulatory Functions Throughout Meiosis https://t.co/Mc7RwNocVR
Marcel_d93: RT @biorxivpreprint: Delineation of the SUMO-Modified Proteome Reveals Regulatory Functions Throughout Meiosis https://t.co/Fbi2p51E04 #bi…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 13
Total Words: 22191
Unqiue Words: 5730

1.998 Mikeys
#6. Type-2 diabetes with low LDL-C: genetic insights into a unique phenotype
Yann C. Klimentidis, Amit Arora, Michelle Newell, Jin Zhou, Jose M Ordovas, Benjamin J Renquist, Alexis C Wood
Although hyperlipidemia is traditionally considered a risk factor for type-2 diabetes (T2D), evidence has emerged from statin trials and candidate gene investigations suggesting that lower LDL-C increases T2D risk. We thus sought to comprehensively examine the phenotypic and genotypic relationships of LDL-C with T2D. Using data from the UK Biobank, we found that LDL-C was negatively associated with T2D (OR=0.43[0.41, 0.45] per mmol/L unit of LDL-C), despite positive associations of LDL-C with HbA1c and BMI. We then performed the first genome-wide exploration of variants simultaneously associated with lower LDL-C and increased T2D risk, using data on LDL-C from the UK Biobank (n=431,167) and the GLGC consortium (n=188,577), and T2D from the DIAGRAM consortium (n=898,130). We identified 31 loci associated with lower LDL-C and increased T2D, capturing several potential mechanisms. Seven of these loci have previously been identified for this phenotype, and 9 have previously been implicated in non-alcoholic fatty liver disease....
more | pdf
Figures
None.
Tweets
UnsilencedSci: RT @biorxiv_genetic: Type-2 diabetes with low LDL-C: genetic insights into a unique phenotype https://t.co/akP14nxiTX #biorxiv_genetic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#7. Elevated polygenic burden for ASD is associated with the broad autism phenotype
Kritika Nayar, Julia M Sealock, Nell Maltman, Lauren Bush, Edwin H Cook, Lea K Davis, Molly Losh
Background: Autism spectrum disorder (ASD) is a multifactorial, neurodevelopmental disorder that encompasses a complex and heterogeneous set of traits. Subclinical traits that mirror the core features of ASD, referred to as the broad autism phenotype (BAP) have been documented repeatedly in unaffected relatives and are believed to reflect underlying genetic liability to ASD. The BAP may help inform the etiology of ASD by allowing the stratification of families into more phenotypically and etiologically homogeneous subgroups. This study explored polygenic scores related to the BAP. Methods: Phenotypic and genotypic information were obtained from 2,614 trios from Simons Simplex Sample. Polygenic scores of ASD (ASD-PGS) were generated across the sample to determine the shared genetic overlap between the BAP and ASD. Maternal and Paternal ASD-PGS was explored in relation to BAP traits and their child ASD symptomatology. Results: Maternal pragmatic language was related to childs social communicative atypicalities. In fathers, rigid...
more | pdf
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : [112]
Authors: 7
Total Words: 7938
Unqiue Words: 2430

1.997 Mikeys
#8. Development of microsatellite markers for the threatened species Coleocephalocereus purpureus (Cactaceae) using next-generation sequencing
Daphne Amaral Fraga, Anderson Figueiredo Carvalho, Ricardo Souza Santana, Marlon Camara Machado, Gustavo Augusto Lacorte
Ten microsatellite loci were developed and validated for the endangered cactus species Coleocephalocereus purpureus. The markers were obtained from sequences generated by whole genome shotgun sequencing approaches. A testing group of 36 specimens of the main grouping were genotyped and all described markers presented suitable outcomes to population genetic studies, showing polymorphic status for C. purpureus testing group with clean and reproducible amplification. No evidence for scoring errors, null alleles or linkage disequilibrium was detected. Number of alleles per locus ranged from 3 to 6 and expected heterozygosity ranged from 0.78 to 0.99. These new microsatellite loci are suitable to be used in future diversity and structure population studies of C. purpureus.
more | pdf
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#9. Oligogenic rare variant contributions in schizophrenia and their convergence with genes harboring de novo mutations in schizophrenia, autism, and intellectual disability: Evidence from multiplex families
Jibin John, Prachi Kukshal, Triptish Bhatia, Ricardo Harripaul, Vishwajit L Nimgaonkar, S N Deshpande, BK Thelma
Clinical and genetic heterogeneity has been documented extensively in schizophrenia, a common behavioural disorder with heritability estimates of about 80%. Common and rare de novo variant based studies have provided notable evidence for the likely involvement of a range of pathways including glutamatergic, synaptic signalling and neurodevelopment. To complement these studies, we sequenced exomes of 11 multimember affected schizophrenia families from India. Variant prioritisation performed based on their rarity (MAF <0.01), shared presence among the affected individuals in the respective families and predicted deleterious nature, yielded a total of 785 inherited rare protein sequence altering variants in 743 genes among the 11 families. These showed an enrichment of genes involved in the extracellular matrix and cytoskeleton components, synaptic and neuron related ontologies and neurodevelopmental pathways, consistent with major etiological hypotheses. We also noted an overrepresentation of genes from previously reported gene sets...
more | pdf
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : [643, 1683, 4484, 51, 885, 106, 906, 990, 1192, 325, 3455, 3528, 297, 3230, 1732, 788, 939, 785, 743]
Authors: 7
Total Words: 11277
Unqiue Words: 3830

1.997 Mikeys
#10. Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies
Zhangyi He, Xiaoyang Dai, Mark Ashton Beaumont, Feng Yu
Thanks to advances in ancient DNA preparation and sequencing techniques, time serial samples of segregating alleles are becoming more widely available in ancestral populations. Such time series data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. Here we develop a likelihood-based method for co-estimating the selection coefficient and the allele age from allele frequency time series data. Our method is built on the hidden Markov model incorporating the Wright-Fisher diffusion conditioned to survive until the time of the most recent sample, which circumvents the assumption required in existing methods that the allele is created by mutation at a certain small frequency. We calculate the likelihood by numerically solving the Kolmogorov backward equation resulting from the conditioned Wright-Fisher diffusion backwards in time and re-weighting the solution by the emission probabilities of the observation at each sampling time point, which allows for a...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies https://t.co/cuwybcqXfS #bioRxiv
biorxiv_genetic: Maximum likelihood estimation of natural selection and allele age from time series data of allele frequencies https://t.co/ky8hpYyrny #biorxiv_genetic
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 222,145 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 222,145 papers.