Top 10 Biorxiv Papers Today in Bioinformatics


2.102 Mikeys
#1. SNP-CRISPR: a web tool for SNP-specific genome editing
Chiao-Lin Chen, Jonathan Rodiger, Verena Chung, Raghuvir Viswanatha, Stephanie E. Mohr, Yanhui Hu, Norbert Perrimon
CRISPR-Cas9 is a powerful genome editing technology in which a short guide RNA (sgRNA) confers target site specificity to achieve Cas9-mediated genome editing. Numerous sgRNA design tools have been developed based on reference genomes for humans and model organisms. However, existing resources are not optimal as genetic mutations or single nucleotide polymorphisms (SNPs) within the targeting region affect the efficiency of CRISPR-based approaches by interfering with guide-target complementarity. To facilitate identification of sgRNAs (1) in non-reference genomes, (2) across varying genetic backgrounds, or (3) for specific targeting of SNP-containing alleles, for example, disease relevant mutations, we developed a web tool, SNP-CRISPR (https://www.flyrnai.org/tools/snp_crispr/). SNP-CRISPR can be used to design sgRNAs based on public variant data sets or user-identified variants. In addition, the tool computes efficiency and specificity scores for sgRNA designs targeting both the variant and the reference. Moreover, SNP-CRISPR...
more | pdf
Figures
Tweets
biorxivpreprint: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/eWYIFnWsEA #bioRxiv
biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
sabahzero: RT @biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
PrecursorCell: RT @biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
jdmontenegroc: RT @biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
shaman_ns: RT @biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
aTailaTheWun: RT @biorxivpreprint: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/eWYIFnWsEA #bioRxiv
hdeshmuk: RT @biorxiv_bioinfo: SNP-CRISPR: a web tool for SNP-specific genome editing https://t.co/aZFnjpnZyP #biorxiv_bioinfo
Github

SNP-targeted CRISPR design pipeline

Repository: snp_crispr
User: jrodiger
Language: Python
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 5326
Unqiue Words: 2022

2.029 Mikeys
#2. Tempora: cell trajectory inference using time-series single-cell RNA sequencing data
Thinh N Tran, Gary Bader
Single-cell RNA sequencing (scRNAseq) can map cell types, states and transitions during dynamic biological processes such as development and regeneration. Many trajectory inference methods have been developed to order cells by their progression through a dynamic process. However, when time series data is available, these methods do not consider the available time information when ordering cells and are instead designed to work only on a single scRNAseq data snapshot. We present Tempora, a novel cell trajectory inference method that orders cells using time information from time-series scRNAseq data. In performance comparison tests, Tempora accurately inferred developmental lineages in human skeletal myoblast differentiation and murine cerebral cortex development, beating state of the art methods. Tempora uses biological pathway information to help identify cell type relationships and can identify important time-dependent pathways to help interpret the inferred trajectory. Our results demonstrate the utility of time information to...
more | pdf
Figures
Tweets
razoralign: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/x1VNHenx39 https://t.co/eJ2IWaNwQU
PromPreprint: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/8aH9Bas3lj
BioRxivCurator: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/50eJrbJlh2
KKami1115: RT @biorxiv_bioinfo: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/yNe1mPMMCB #biorxiv…
genolib_19: RT @biorxivpreprint: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/blJgQ4Uh5y #bioRxiv
itsjeffreyy76: RT @biorxiv_bioinfo: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/yNe1mPMMCB #biorxiv…
PolymorphismJ: RT @biorxivpreprint: Tempora: cell trajectory inference using time-series single-cell RNA sequencing data https://t.co/blJgQ4Uh5y #bioRxiv
Github

Pathway-based trajectory inference method for time-series scRNAseq data

Repository: Tempora
User: BaderLab
Language: R
Stargazers: 3
Subscribers: 11
Forks: 0
Open Issues: 1
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8392
Unqiue Words: 2195

2.019 Mikeys
#3. Comprehensive analysis of structural variants in breast cancer genomes using single molecule sequencing
Sergey Aganezov, Sara Goodwin, Rachel Sherman, Fritz J. Sedlazeck, Gayatri Arun, Sonam Bhatia, Isac Lee, Melanie Kirsche, Robert Wappel, Melissa Kramer, Karen Kostroff, David L. Spector, Winston Timp, W. Richard McCombie, Michael C. Schatz
Improved identification of structural variants (SVs) in cancer can lead to more targeted and effective treatment options as well as advance our basic understanding of disease progression. We performed whole genome sequencing of the SKBR3 breast cancer cell-line and patient-derived tumor and normal organoids from two breast cancer patients using 10X/Illumina, PacBio, and Oxford Nanopore sequencing. We then inferred SVs and large-scale allele-specific copy number variants (CNVs) using an ensemble of methods. Our findings demonstrate that long-read sequencing allows for substantially more accurate and sensitive SV detection, with between 90% and 95% of variants supported by each long-read technology also supported by the other. We also report high accuracy for long-reads even at relatively low coverage (25x-30x). Furthermore, we inferred karyotypes from these data using our enhanced RCK algorithm to present a more accurate representation of the mutated cancer genomes, and find hundreds of variants affecting known cancer-related genes...
more | pdf
Figures
None.
Tweets
fordham_dan: https://t.co/S150qqO83T "robust SV detection is possible at relatively low ~30x avg cov. w/ ONT or PacBio sequencing. When applied at scale, costs for 30x is < $1k per sample for ONT PromethION & < $2k for PacBio Sequel II, which is highly comparable to ~$800/$1k (ILMN/10X)
boti_ka: RT @fordham_dan: https://t.co/S150qqO83T "robust SV detection is possible at relatively low ~30x avg cov. w/ ONT or PacBio sequencing. Whe…
colindaven: RT @fordham_dan: https://t.co/S150qqO83T "robust SV detection is possible at relatively low ~30x avg cov. w/ ONT or PacBio sequencing. Whe…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 15
Total Words: 0
Unqiue Words: 0

2.018 Mikeys
#4. Semi-supervised identification of cell populations in single-cell ATAC-seq
Pawel F Przytycki, Katherine S Pollard
Identifying high-confidence cell-type specific open chromatin regions with coherent regulatory function from single-cell open chromatin data (scATAC-seq) is difficult due to the complexity of resolving cell types given the low coverage of reads per cell. In order to address this problem, we present Semi-Supervised Identification of Populations of cells in scATAC-seq data (SSIPs), a semi-supervised approach that integrates bulk and single-cell data through a generalizable network model featuring two types of nodes. Nodes of the first type represent cells from scATAC-seq with edges between them encoding information about cell similarity. A second set of nodes represents "supervising" datasets connected to cell nodes with edges that encode the similarity between that data and each cell. Via global calculations of network influence, this model allows us to quantify the influence of bulk data on scATAC-seq data and estimate the contributions of scATAC-seq cell populations to signals in bulk data. Using simulated data, we show that...
more | pdf
Figures
None.
Tweets
razoralign: SSIPs: Semi-supervised identification of cell populations in single-cell ATAC-seq https://t.co/2q9zLFDKEA https://t.co/dDXojV5G2c
Coco_Lucho2: RT @biorxivpreprint: Semi-supervised identification of cell populations in single-cell ATAC-seq https://t.co/y08g5BazHc #bioRxiv
TH_KUO: RT @biorxiv_bioinfo: Semi-supervised identification of cell populations in single-cell ATAC-seq https://t.co/rhlMVFsjK8 #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.017 Mikeys
#5. DeepSide: A Deep Learning Framework for Drug Side Effect Prediction
Onur Can Uner, Ramazan Gokberk Cinbis, Oznur Tastan, A. Ercument Cicek
Drug failures due to unforeseen adverse effects at clinical trials pose health risks for the participants and lead to substantial financial losses. Side effect prediction algorithms have the potential to guide the drug design process. LINCS L1000 dataset provides a vast resource of cell line gene expression data perturbed by different drugs and creates a knowledge base for context specific features. The state-of-the-art approach that aims at using context specific information relies on only the high-quality experiments in LINCS L1000 and discards a large portion of the experiments. In this study, our goal is to boost the prediction performance by utilizing this data to its full extent. We experiment with 5 deep learning architectures. We find that a multi-modal architecture produces the best predictive performance among multi-layer perceptron-based architectures when drug chemical structure (CS), and the full set of drug perturbed gene expression profiles (GEX) are used as modalities. Overall, we observe that the CS is more...
more | pdf
Figures
None.
Tweets
biorxivpreprint: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/lCiMWWBIwW #bioRxiv
biorxiv_bioinfo: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/PPmoRDlQEZ #biorxiv_bioinfo
rkakamilan: RT @biorxivpreprint: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/lCiMWWBIwW #bioRxiv
ma63713534: RT @biorxivpreprint: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/lCiMWWBIwW #bioRxiv
IchiroNakanoNS: RT @biorxivpreprint: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/lCiMWWBIwW #bioRxiv
martielafreitas: RT @biorxiv_bioinfo: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/PPmoRDlQEZ #biorxiv_bioinfo
PolymorphismJ: RT @biorxivpreprint: DeepSide: A Deep Learning Framework for Drug Side Effect Prediction https://t.co/lCiMWWBIwW #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

2.015 Mikeys
#6. FastFeatGen: Faster parallel feature extraction from genome sequences and efficient prediction of DNA N6-methyladenine sites
Md. Khaledur Rahman
N6-methyladenine is widely found in both prokaryotes and eukaryotes. It is responsible for many biological processes including prokaryotic defense system and human diseases. So, it is important to know its correct location in genome which may play a significant role in different biological functions. Few computational tools exist to serve this purpose but they are computationally expensive and still there is scope to improve accuracy. An informative feature extraction pipeline from genome sequences is the heart of these tools as well as for many other bioinformatics tools. But it becomes reasonably expensive for sequential approaches when the size of data is large. Hence, a scalable parallel approach is highly desirable. In this paper, we have developed a new tool, called FastFeatGen, emphasizing both developing a parallel feature extraction technique and improving accuracy using machine learning methods. We have implemented our feature extraction approach using shared memory parallelism which achieves around 10x speed over the...
more | pdf
Figures
Tweets
biorxivpreprint: FastFeatGen: Faster parallel feature extraction from genome sequences and efficient prediction of DNA N6-methyladenine sites https://t.co/Rjgh99Lzdu #bioRxiv
biorxiv_bioinfo: FastFeatGen: Faster parallel feature extraction from genome sequences and efficient prediction of DNA N6-methyladenine sites https://t.co/64sxDd6pgK #biorxiv_bioinfo
Github

Faster parallel feature extraction from genome sequence

Repository: FastFeatGen
User: khaled-rahman
Language: Jupyter Notebook
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 5936
Unqiue Words: 1862

2.011 Mikeys
#7. Improved protein structure prediction using predicted inter-residue orientations
Jianyi Yang, Ivan Anishchenko, Hahnbeom Park, Zhenling Peng, Sergey Ovchinnikov, David Baker
The prediction of inter-residue contacts and distances from co-evolutionary data using deep learning has considerably advanced protein structure prediction. Here we build on these advances by developing a deep residual network for predicting inter-residue orientations in addition to distances, and a Rosetta constrained energy minimization protocol for rapidly and accurately generating structure models guided by these restraints. In benchmark tests on CASP13 and CAMEO derived sets, the method outperforms all previously described structure prediction methods. Although trained entirely on native proteins, the network consistently assigns higher probability to de novo designed proteins, identifying the key fold determining residues and providing an independent quantitative measure of the "ideality" of a protein structure. The method promises to be useful for a broad range of protein structure prediction and design problems.
more | pdf
Figures
Tweets
Ag_smith: RT @biorxivpreprint: Improved protein structure prediction using predicted inter-residue orientations https://t.co/9TDvtpY950 #bioRxiv
nob_mai: RT @biorxivpreprint: Improved protein structure prediction using predicted inter-residue orientations https://t.co/9TDvtpY950 #bioRxiv
hdeshmuk: RT @biorxiv_bioinfo: Improved protein structure prediction using predicted inter-residue orientations https://t.co/dQVZvzWOQW #biorxiv_bio…
Github

A package to predict protein inter-residue geometries from sequence data

Repository: trRosetta
User: gjoni
Language: Python
Stargazers: 5
Subscribers: 3
Forks: 1
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 8753
Unqiue Words: 2515

2.009 Mikeys
#8. A hybrid model for predicting pattern recognition receptors using evolutionary information
Dilraj Kaur, Chakit Arora, Gajendra P.S. Raghava
This study describes a method developed for predicting pattern recognition receptors (PRRs), which are an integral part of the immune system. The models developed here were trained and evaluated on the largest possible non-redundant PRRs, and non-pattern recognition receptors (Non-PRRs) obtained from PRRDB 2.0. Firstly, a similarity-based approach using BLAST was used to predict PRRs and got limited success due to a large number of no-hits. Secondly, machine learning-based models were developed using sequence composition and achieved a maximum MCC of 0.63. In addition to this, models were developed using evolutionary information in the form of PSSM composition and achieved maximum MCC value of 0.66. Finally, we developed hybrid models that combined a similarity-based approach using BLAST and machine learning-based models. Our best model, which combined BLAST and PSSM based model, achieved a maximum MCC value of 0.82 with an AUROC value of 0.95, utilizing the potential of both similarity-based search and machine learning...
more | pdf
Figures
Tweets
biorxivpreprint: A hybrid model for predicting pattern recognition receptors using evolutionary information https://t.co/rngwc29mFB #bioRxiv
biorxiv_bioinfo: A hybrid model for predicting pattern recognition receptors using evolutionary information https://t.co/1n3WjH6XD6 #biorxiv_bioinfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : [20]
Authors: 3
Total Words: 6795
Unqiue Words: 2260

2.005 Mikeys
#9. Evaluation of Connectivity Map shows limited reproducibility in drug repositioning
Nathaniel Lim, Paul Pavlidis
The Connectivity Map (CMap) is a popular resource designed for data-driven drug repositioning using a large transcriptomic compendium. However, evaluations of its performance are limited. We used two iterations of CMap (CMap 1 and 2) to assess their comparability and reliability. We queried CMap 2 with CMap 1-derived signatures, expecting CMap 2 would highly prioritize the queried compounds; success rate was 17%. Analysis of previously published prioritizations yielded similar results. Low recall is caused by low differential expression (DE) reproducibility both between CMaps and within each CMap. DE strength was predictive of reproducibility, and is influenced by compound concentration and cell-line responsiveness. Reproducibility of CMap 2 sample expression levels was also lower than expected. We attempted to identify the "better" CMap by comparison with a third dataset, but they were mutually discordant. Our findings have implications for CMap usage and we suggest steps for investigators to limit false positives.
more | pdf
Figures
None.
Tweets
allmeasures: Connectivity map is terrible https://t.co/WXUaLjFKsQ: "We queried CMap 2 with CMap 1-derived signatures, expecting CMap 2 would highly prioritize the queried compounds; success rate was 17%."
JimJohnsonSci: RT @biorxivpreprint: Evaluation of Connectivity Map shows limited reproducibility in drug repositioning https://t.co/qkIROFT0pV #bioRxiv
RyanDhindsa: RT @biorxivpreprint: Evaluation of Connectivity Map shows limited reproducibility in drug repositioning https://t.co/qkIROFT0pV #bioRxiv
IchiroNakanoNS: RT @biorxivpreprint: Evaluation of Connectivity Map shows limited reproducibility in drug repositioning https://t.co/qkIROFT0pV #bioRxiv
JScelza: RT @biorxivpreprint: Evaluation of Connectivity Map shows limited reproducibility in drug repositioning https://t.co/qkIROFT0pV #bioRxiv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.004 Mikeys
#10. Comprehensive biological interpretation of gene signatures using semantic distributed representation
Yuumi Okuzono, Takashi Hoshino
Recent rise of microarray and next-generation sequencing in genome-related fields has simplified obtaining gene expression data at whole gene level, and biological interpretation of gene signatures related to life phenomena and diseases has become very important. However, the conventional method is numerical comparison of gene signature, pathway, and gene ontology (GO) overlap and distribution bias, and it is not possible to compare the specificity and importance of genes contained in gene signatures as humans do. This study proposes the gene signature vector (GsVec), a unique method for interpreting gene signatures that clarifies the semantic relationship between gene signatures by incorporating a method of distributed document representation from natural language processing (NLP). In proposed algorithm, a gene-topic vector is created by multiplying the feature vector based on the gene's distributed representation by the probability of the gene signature topic and the low frequency of occurrence of the corresponding gene in all...
more | pdf
Figures
None.
Tweets
razoralign: GsVec: Comprehensive biological interpretation of gene signatures using semantic distributed representation https://t.co/fUcQ30F9o4 https://t.co/Kty9PY3DG6
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 225,391 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 225,391 papers.