Top 1 Biorxiv Papers Today in Synthetic Biology


2.012 Mikeys
#1. Universal Deep Sequence Models for Protein Classification
Nils Strodthoff, Patrick Wagner, Markus Wenzel, Wojciech Samek
Inferring the properties of protein from its amino acid sequence is one of the key problems in bioinformatics. Most state-of-the-art approaches for protein classification tasks are tailored to specific classification tasks and rely on handcrafted features such as position-specific-scoring matrices from expensive database searches and show an astonishing performance on different tasks. We argue that a similar level of performance can be reached by leveraging the vast amount of unlabeled protein sequence data available from protein sequence databases using a generic architecture that is not tailored to the specific classification task under consideration. To this end, we put forward UDSMProt, a universal deep sequence model that is pretrained on a language modeling task on the Swiss-Prot database and finetuned on various protein classification tasks. For three different tasks, namely enzyme class prediction, gene ontology prediction and remote homology and fold detection, we demonstrate the feasibility of inferring protein...
more | pdf
Figures
None.
Tweets
biorxivpreprint: Universal Deep Sequence Models for Protein Classification https://t.co/zRDiwYnJil #bioRxiv
seb_ruder: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
sigitpurnomo: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
scholarcy: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
JoaoVictor_AC: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
PerthMLGroup: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
PhiloNeurosci: RT @biorxivpreprint: Universal Deep Sequence Models for Protein Classification https://t.co/zRDiwYnJil #bioRxiv
simecek: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
jmcimula: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
lucidrains: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
enerphyschem: RT @biorxivpreprint: Universal Deep Sequence Models for Protein Classification https://t.co/zRDiwYnJil #bioRxiv
kotti_sasikanth: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
AssistedEvolve: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
adhaamehab: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
ItanLab: RT @biorxivpreprint: Universal Deep Sequence Models for Protein Classification https://t.co/zRDiwYnJil #bioRxiv
Feldman1Michael: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
karlafej: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
imtechmonk: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
brunoboutteau: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
GerardoEGarcia: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
gddiwan: RT @biorxivpreprint: Universal Deep Sequence Models for Protein Classification https://t.co/zRDiwYnJil #bioRxiv
your_faiz: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
ShuvenduBikash: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
foxhu007: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
JoanGibert4: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
samhardyhey: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
DeepHindsight: RT @nstrodt: Excited to share our recent preprint https://t.co/19D7q4mL6E turns out that language model pretraining a la ULMFiT by @seb_rud…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11494
Unqiue Words: 2926

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 160,434 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Feedback
Online
Stats
Tracking 160,434 papers.