Top 6 Arxiv Papers Today in Computation And Language


2.049 Mikeys
#1. Low Resource Text Classification with ULMFit and Backtranslation
Sam Shleifer
In computer vision, virtually every state of the art deep learning system is trained with data augmentation. In text classification, however, data augmentation is less widely practiced because it must be performed before training and risks introducing label noise. We augment the IMDB movie reviews dataset with examples generated by two families of techniques: random token perturbations introduced by Wei and Zou [2019] and backtranslation -- translating to a second language then back to English. In low resource environments, backtranslation generates significant improvement on top of the state-of-the-art ULMFit model. A ULMFit model pretrained on wikitext103 and then finetuned on only 50 IMDB examples and 500 synthetic examples generated by backtranslation achieves 80.6\% accuracy, an 8.1\% improvement over the augmentation-free baseline with only 9 minutes of additional training time. Random token perturbations do not yield any improvements but incur equivalent computational cost. The benefits of training with backtranslated...
more | pdf | html
Figures
Tweets
arxiv_org: Low Resource Text Classification with ULMFit and Backtranslation. https://t.co/IDTtzZ3746 https://t.co/1b3LbA448x
arxivml: "Low Resource Text Classification with ULMFit and Backtranslation", Sam Shleifer https://t.co/HP4V9PulG0
SigP226: #stanfordnlp RT pnderthevstnes: ULMFit from fastai + Data Augmentation with backtranslation can get 80+% validation accuracy using only 50 training examples on #NLP IMDB sentiment classification! Full paper for #cs224n at https://t.co/n7r93xlJZ5. Thread below.
hereticreader: Low Resource Text Classification with ULMFit and Backtranslation - https://t.co/9D1COPFWPY https://t.co/tiNbdfMNWg
Memoirs: Low Resource Text Classification with ULMFit and Backtranslation. https://t.co/QHXGDetFWp
pnderthevstnes: ULMFit from @fastai + Data Augmentation with backtranslation can get 80+% validation accuracy using only 50 training examples on #NLP IMDB sentiment classification! Full paper for #cs224n at https://t.co/OlivTFOG3G. Thread below.
arxiv_cscl: Low Resource Text Classification with ULMFit and Backtranslation https://t.co/zSHyJ1sp34
arxiv_cscl: Low Resource Text Classification with ULMFit and Backtranslation https://t.co/zSHyJ1sp34
subhobrata1: RT @arxiv_org: Low Resource Text Classification with ULMFit and Backtranslation. https://t.co/IDTtzZ3746 https://t.co/1b3LbA448x
thapraveensingh: RT @arxiv_org: Low Resource Text Classification with ULMFit and Backtranslation. https://t.co/IDTtzZ3746 https://t.co/1b3LbA448x
Github

Backtranslations of IMDB movie reviews for Data Augmentation Purposes

Repository: backtranslated-imdb
User: sshleifer
Language: None
Stargazers: 1
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : [50, 1000]
Authors: 1
Total Words: 3769
Unqiue Words: 1421

2.031 Mikeys
#2. Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence
Chi Sun, Luyao Huang, Xipeng Qiu
Aspect-based sentiment analysis (ABSA), which aims to identify fine-grained opinion polarity towards a specific aspect, is a challenging subtask of sentiment analysis (SA). In this paper, we construct an auxiliary sentence from the aspect and convert ABSA to a sentence-pair classification task, such as question answering (QA) and natural language inference (NLI). We fine-tune the pre-trained model from BERT and achieve new state-of-the-art results on SentiHood and SemEval-2014 Task 4 datasets.
more | pdf | html
Figures
None.
Tweets
Miles_Brundage: "Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence," Sun et al.: https://t.co/ysYNAs4vbi
arxivml: "Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence", Chi Sun, Luyao Huang, Xip… https://t.co/2Z8Vgt3EbF
arxiv_cscl: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence https://t.co/VQz2J8pdCj
arxiv_cscl: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence https://t.co/VQz2J8pdCj
ComputerPapers: Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence. https://t.co/WFV0h4P6qG
Github

Utilizing BERT for Aspect-Based Sentiment Analysis via Constructing Auxiliary Sentence (NAACL 2019)

Repository: ABSA-BERT-pair
User: HSLCY
Language: Python
Stargazers: 7
Subscribers: 1
Forks: 3
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3510
Unqiue Words: 1352

2.024 Mikeys
#3. LINSPECTOR: Multilingual Probing Tasks for Word Representations
Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Iryna Gurevych
Despite an ever growing number of word representation models introduced for a large number of languages, there is a lack of a standardized technique to provide insights into what is captured by these models. Such insights would help the community to get an estimate of the downstream task performance, as well as to design more informed neural architectures, while avoiding extensive experimentation which requires substantial computational resources not all researchers have access to. A recent development in NLP is to use simple classification tasks, also called probing tasks, that test for a single linguistic feature such as part-of-speech. Existing studies mostly focus on exploring the information encoded by the sentence-level representations for English. However, from a typological perspective the morphologically poor English is rather an outlier: the information encoded by the word order and function words in English is often stored on a subword, morphological level in other languages. To address this, we introduce 15 word-level...
more | pdf | html
Figures
Tweets
arxivml: "LINSPECTOR: Multilingual Probing Tasks for Word Representations", Gözde Gül Şahin, Clara Vania, Ilia Kuznetsov, Ir… https://t.co/whvKTt0gwr
arxiv_cscl: LINSPECTOR: Multilingual Probing Tasks for Word Representations https://t.co/r5DUyiCW5k
arxiv_cscl: LINSPECTOR: Multilingual Probing Tasks for Word Representations https://t.co/r5DUyiCW5k
ComputerPapers: LINSPECTOR: Multilingual Probing Tasks for Word Representations. https://t.co/AyJuX5mJn9
Github
Repository: linspector
User: UKPLab
Language: Python
Stargazers: 4
Subscribers: 17
Forks: 0
Open Issues: 1
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 15092
Unqiue Words: 4346

2.022 Mikeys
#4. An end-to-end Neural Network Framework for Text Clustering
Jie Zhou, Xingyi Cheng, Jinchao Zhang
The unsupervised text clustering is one of the major tasks in natural language processing (NLP) and remains a difficult and complex problem. Conventional \mbox{methods} generally treat this task using separated steps, including text representation learning and clustering the representations. As an improvement, neural methods have also been introduced for continuous representation learning to address the sparsity problem. However, the multi-step process still deviates from the unified optimization target. Especially the second step of cluster is generally performed with conventional methods such as k-Means. We propose a pure neural framework for text clustering in an end-to-end manner. It jointly learns the text representation and the clustering model. Our model works well when the context can be obtained, which is nearly always the case in the field of NLP. We have our method \mbox{evaluated} on two widely used benchmarks: IMDB movie reviews for sentiment classification and $20$-Newsgroup for topic categorization. Despite its...
more | pdf | html
Figures
Tweets
arxivml: "An end-to-end Neural Network Framework for Text Clustering", Jie Zhou, Xingyi Cheng, Jinchao Zhang https://t.co/XuCy8y1YXT
arxiv_cscl: An end-to-end Neural Network Framework for Text Clustering https://t.co/mtaHWykne6
arxiv_cscl: An end-to-end Neural Network Framework for Text Clustering https://t.co/mtaHWy2Mmy
arxiv_cscl: An end-to-end Neural Network Framework for Text Clustering https://t.co/mtaHWy2Mmy
ComputerPapers: An end-to-end Neural Network Framework for Text Clustering. https://t.co/aKLOrDuXbu
_Artemisa_v: RT @arxiv_cscl: An end-to-end Neural Network Framework for Text Clustering https://t.co/mtaHWykne6
nullbytep: RT @arxiv_cscl: An end-to-end Neural Network Framework for Text Clustering https://t.co/mtaHWy2Mmy
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6193
Unqiue Words: 2170

2.021 Mikeys
#5. A Type-coherent, Expressive Representation as an Initial Step to Language Understanding
Gene Louis Kim, Lenhart Schubert
A growing interest in tasks involving language understanding by the NLP community has led to the need for effective semantic parsing and inference. Modern NLP systems use semantic representations that do not quite fulfill the nuanced needs for language understanding: adequately modeling language semantics, enabling general inferences, and being accurately recoverable. This document describes underspecified logical forms (ULF) for Episodic Logic (EL), which is an initial form for a semantic representation that balances these needs. ULFs fully resolve the semantic type structure while leaving issues such as quantifier scope, word sense, and anaphora unresolved; they provide a starting point for further resolution into EL, and enable certain structural inferences without further resolution. This document also presents preliminary results of creating a hand-annotated corpus of ULFs for the purpose of training a precise ULF parser, showing a three-person pairwise interannotator agreement of 0.88 on confident annotations. We hypothesize...
more | pdf | html
Figures
Tweets
arxivml: "A Type-coherent, Expressive Representation as an Initial Step to Language Understanding", Gene Louis Kim, Lenhart … https://t.co/Dpj7LbwRyI
SciFi: A Type-coherent, Expressive Representation as an Initial Step to Language Understanding. https://t.co/N5M6Xdct7z
arxiv_cscl: A Type-coherent, Expressive Representation as an Initial Step to Language Understanding https://t.co/jYTsmV740f
arxiv_cscl: A Type-coherent, Expressive Representation as an Initial Step to Language Understanding https://t.co/jYTsmUPt8H
arxiv_cscl: A Type-coherent, Expressive Representation as an Initial Step to Language Understanding https://t.co/jYTsmUPt8H
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 9605
Unqiue Words: 3275

2.015 Mikeys
#6. Data Augmentation via Dependency Tree Morphing for Low-Resource Languages
Gözde Gül Şahin, Mark Steedman
Neural NLP systems achieve high scores in the presence of sizable training dataset. Lack of such datasets leads to poor system performances in the case low-resource languages. We present two simple text augmentation techniques using dependency trees, inspired from image processing. We crop sentences by removing dependency links, and we rotate sentences by moving the tree fragments around the root. We apply these techniques to augment the training sets of low-resource languages in Universal Dependencies project. We implement a character-level sequence tagging model and evaluate the augmented datasets on part-of-speech tagging task. We show that crop and rotate provides improvements over the models trained with non-augmented data for majority of the languages, especially for languages with rich case marking systems.
more | pdf | html
Figures
None.
Tweets
arxivml: "Data Augmentation via Dependency Tree Morphing for Low-Resource Languages", Gözde Gül Şahin, Mark Steedman https://t.co/IJ1Qz4CAA3
arxiv_cscl: Data Augmentation via Dependency Tree Morphing for Low-Resource Languages https://t.co/pWYWe7Dwcs
arxiv_cscl: Data Augmentation via Dependency Tree Morphing for Low-Resource Languages https://t.co/pWYWe7Dwcs
ComputerPapers: Data Augmentation via Dependency Tree Morphing for Low-Resource Languages. https://t.co/BjHg9VBfON
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 100,376 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 100,376 papers.