Top 10 Arxiv Papers Today in Computation And Language


2.198 Mikeys
#1. Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
Yuan Luo, Peter Szolovits
This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical...
more | pdf | html
Figures
Tweets
SciFi: Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective. https://t.co/OHd1dMdI4i
arxiv_cscl: Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective https://t.co/NMruk7diiW
arxiv_cscl: Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective https://t.co/NMruk7uTau
Github

LISP Architecture for Portable Natural Language Processing

Repository: lapnlp
User: yuanluo
Language: Common Lisp
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5010
Unqiue Words: 1825

2.182 Mikeys
#2. Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference
Masashi Yoshikawa, Koji Mineshima, Hiroshi Noji, Daisuke Bekki
In logic-based approaches to reasoning tasks such as Recognizing Textual Entailment (RTE), it is important for a system to have a large amount of knowledge data. However, there is a tradeoff between adding more knowledge data for improved RTE performance and maintaining an efficient RTE system, as such a big database is problematic in terms of the memory usage and computational complexity. In this work, we show the processing time of a state-of-the-art logic-based RTE system can be significantly reduced by replacing its search-based axiom injection (abduction) mechanism by that based on Knowledge Base Completion (KBC). We integrate this mechanism in a Coq plugin that provides a proof automation tactic for natural language inference. Additionally, we show empirically that adding new knowledge data contributes to better RTE performance while not harming the processing speed in this framework.
more | pdf | html
Figures
Tweets
blankeyelephant: Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference arXiv: https://t.co/FkBXCfBN77
SciFi: Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference. https://t.co/lrsYvVb7Cz
arxiv_cscl: Combining Axiom Injection and Knowledge Base Completion for Efficient Natural Language Inference https://t.co/lLeKOi2iWn
Github
Repository: abduction_kbc
User: masashi-y
Language: OCaml
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6763
Unqiue Words: 2280

2.167 Mikeys
#3. Exploiting Sentence Embedding for Medical Question Answering
Yu Hao, Xien Liu, Ji Wu, Ping Lv
Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Exploiting Sentence Embedding for Medical Question Answering. Yu Hao, Xien Liu, Ji Wu, and Ping Lv https://t.co/7XVZRUOp1c
arxivml: "Exploiting Sentence Embedding for Medical Question Answering", Yu Hao, Xien Liu, Ji Wu, Ping Lv https://t.co/IZFAtpp2zl
SciFi: Exploiting Sentence Embedding for Medical Question Answering. https://t.co/CRD8RisTr0
arxiv_cscl: Exploiting Sentence Embedding for Medical Question Answering https://t.co/gWMWPYTj94
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6135
Unqiue Words: 2125

2.131 Mikeys
#4. Characterizing Design Patterns of EHR-Driven Phenotype Extraction Algorithms
Yizhen Zhong, Luke Rasmussen, Yu Deng, Jennifer Pacheco, Maureen Smith, Justin Starren, Wei-Qi Wei, Peter Speltz, Joshua Denny, Nephi Walton, George Hripcsak, Christopher G Chute, Yuan Luo
The automatic development of phenotype algorithms from Electronic Health Record data with machine learning (ML) techniques is of great interest given the current practice is very time-consuming and resource intensive. The extraction of design patterns from phenotype algorithms is essential to understand their rationale and standard, with great potential to automate the development process. In this pilot study, we perform network visualization on the design patterns and their associations with phenotypes and sites. We classify design patterns using the fragments from previously annotated phenotype algorithms as the ground truth. The classification performance is used as a proxy for coherence at the attribution level. The bag-of-words representation with knowledge-based features generated a good performance in the classification task (0.79 macro-f1 scores). Good classification accuracy with simple features demonstrated the attribution coherence and the feasibility of automatic identification of design patterns. Our results point to...
more | pdf | html
Figures
Tweets
SciFi: Characterizing Design Patterns of EHR-Driven Phenotype Extraction Algorithms. https://t.co/i19WaJrie5
arxiv_cscl: Characterizing Design Patterns of EHR-Driven Phenotype Extraction Algorithms https://t.co/ymEFWnKndp
Github
Repository: PhenoPattern
User: yizhenzhong
Language: Jupyter Notebook
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 13
Total Words: 3310
Unqiue Words: 1355

2.104 Mikeys
#5. A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks
Victor Sanh, Thomas Wolf, Sebastian Ruder
Much efforts has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of understanding of the settings in which multi-task learning has a significant effect. In this work, we introduce a hierarchical model trained in a multi-task learning setup on a set of carefully selected semantic tasks. The model is trained in a hierarchical fashion to introduce an inductive bias by supervising a set of low level tasks at the bottom layers of the model and more complex tasks at the top layers of the model. This model achieves state-of-the-art results on a number of tasks, namely Named Entity Recognition, Entity Mention Detection and Relation Extraction without hand-engineered features or external NLP tools like syntactic parsers. The hierarchical training supervision induces a set of shared semantic representations at lower layers of the model. We show that as we move from the...
more | pdf | html
Figures
None.
Tweets
BrundageBot: A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. Victor Sanh, Thomas Wolf, and Sebastian Ruder https://t.co/eAtcL24Mq4
arxiv_cscl: A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks https://t.co/AY0e5rZQvA
ComputerPapers: A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks. https://t.co/TydOoKi0i3
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 7073
Unqiue Words: 2354

2.104 Mikeys
#6. Effect of data reduction on sequence-to-sequence neural TTS
Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman
Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings. However, these models require large amounts of data. This paper shows that the lack of data from one speaker can be compensated with data from other speakers. The naturalness of Tacotron2-like models trained on a blend of 5k utterances from 7 speakers is better than that of speaker dependent models trained on 15k utterances, but in terms of stability multi-speaker models are always more stable. We also demonstrate that models mixing only 1250 utterances from a target speaker with 5k utterances from another 6 speakers can produce significantly better quality than state-of-the-art DNN-guided unit selection systems trained on more than 10 times the data from the target speaker.
more | pdf | html
Figures
Tweets
BrundageBot: Effect of data reduction on sequence-to-sequence neural TTS. Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, and Thomas Drugman https://t.co/7VSVjTUJ2u
arxiv_cscl: Effect of data reduction on sequence-to-sequence neural TTS https://t.co/YsUDZ9HIEu
ComputerPapers: Effect of data reduction on sequence-to-sequence neural TTS. https://t.co/hDcs5VlzFK
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 4144
Unqiue Words: 1432

2.103 Mikeys
#7. End-to-End Learning for Answering Structured Queries Directly over Text
Paul Groth, Anthony Scerri, Ron Daniel, Jr., Bradley P. Allen
Structured queries expressed in languages (such as SQL, SPARQL, or XQuery) offer a convenient and explicit way for users to express their information needs for a number of tasks. In this work, we present an approach to answer these directly over text data without storing results in a database. We specifically look at the case of knowledge bases where queries are over entities and the relations between them. Our approach combines distributed query answering (e.g. Triple Pattern Fragments) with models built for extractive question answering. Importantly, by applying distributed querying answering we are able to simplify the model learning problem. We train models for a large portion (572) of the relations within Wikidata and achieve an average 0.70 F1 measure across all models. We also present a systematic method to construct the necessary training data for this task from knowledge graphs and describe a prototype implementation.
more | pdf | html
Figures
None.
Tweets
pgroth: New preprint: [1811.06303] End-to-End Learning for Answering Structured Queries Directly over Text #sparql #triplepatternfragments #deeplearning https://t.co/5xIVHi3dQo
arxiv_cscl: End-to-End Learning for Answering Structured Queries Directly over Text https://t.co/Gc5VEeVJFd
ComputerPapers: End-to-End Learning for Answering Structured Queries Directly over Text. https://t.co/qLl2Bjg3am
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7571
Unqiue Words: 2408

2.06 Mikeys
#8. Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector
Shanchan Wu, Kai Fan, Qiong Zhang
Distant supervised relation extraction has been successfully applied to large corpus with thousands of relations. However, the inevitable wrong labeling problem by distant supervision will hurt the performance of relation extraction. In this paper, we propose a method with neural noise converter to alleviate the impact of noisy data, and a conditional optimal selector to make proper prediction. Our noise converter learns the structured transition matrix on logit level and captures the property of distant supervised relation extraction dataset. The conditional optimal selector on the other hand helps to make proper prediction decision of an entity pair even if the group of sentences is overwhelmed by no-relation sentences. We conduct experiments on a widely used dataset and the results show significant improvement over competitive baseline methods.
more | pdf | html
Figures
Tweets
SciFi: Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector. https://t.co/jDpPWHslzn
arxiv_cscl: Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector https://t.co/v4sOash8Re
arxivml: "Improving Distantly Supervised Relation Extraction with Neural Noise Converter and Conditional Optimal Selector", … https://t.co/nW1a2rq2aP
Github

An Open-Source Package for Neural Relation Extraction (NRE) implemented in TensorFlow

Repository: OpenNRE
User: thunlp
Language: Python
Stargazers: 952
Subscribers: 87
Forks: 373
Open Issues: 4
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6430
Unqiue Words: 1765

2.058 Mikeys
#9. Discourse in Multimedia: A Case Study in Information Extraction
Mrinmaya Sachan, Kumar Avinava Dubey, Eduard H. Hovy, Tom M. Mitchell, Dan Roth, Eric P. Xing
To ensure readability, text is often written and presented with due formatting. These text formatting devices help the writer to effectively convey the narrative. At the same time, these help the readers pick up the structure of the discourse and comprehend the conveyed information. There have been a number of linguistic theories on discourse structure of text. However, these theories only consider unformatted text. Multimedia text contains rich formatting features which can be leveraged for various NLP tasks. In this paper, we study some of these discourse features in multimedia text and what communicative function they fulfil in the context. We examine how these multimedia discourse features can be used to improve an information extraction system. We show that the discourse and text layout features provide information that is complementary to lexical semantic information commonly used for information extraction. As a case study, we use these features to harvest structured subject knowledge of geometry from textbooks. We show...
more | pdf | html
Figures
Tweets
arxiv_org: Discourse in Multimedia: A Case Study in Information Extraction. https://t.co/Dbfh3nKkse https://t.co/LUUjLB4R0s
arxivml: "Discourse in Multimedia: A Case Study in Information Extraction", Mrinmaya Sachan, Kumar Avinava Dubey, Eduard H. … https://t.co/WnHv31SGhQ
arxiv_cscl: Discourse in Multimedia: A Case Study in Information Extraction https://t.co/cGv4ZvGDPa
arxiv_cscl: Discourse in Multimedia: A Case Study in Information Extraction https://t.co/cGv4ZvGDPa
ComputerPapers: Discourse in Multimedia: A Case Study in Information Extraction. https://t.co/EjqLLNt2OK
Robots_and_AIs: RT @arxiv_cscl : Discourse in Multimedia: A Case Study in Information Extraction https://t.co/3KQihA0q6V https://t.co/5QBG5vKfRz
Robots_and_AIs: RT @ComputerPapers : Discourse in Multimedia: A Case Study in Information Extraction. https://t.co/3KQihA0q6V https://t.co/Ga8EhzPRDs
Robots_and_AIs: RT @arxiv_cscl : Discourse in Multimedia: A Case Study in Information Extraction https://t.co/3KQihA0q6V https://t.co/NeqittFpyd
Robots_and_AIs: RT @arxivml : "Discourse in Multimedia: A Case Study in Information Extraction", Mrinmaya Sachan, Kumar Avinava Dubey, Eduard H. … https://t.co/Tnm0dVZ8n0 https://t.co/GIjPuM0vKA
Robots_and_AIs: RT @arxiv_cscl : Discourse in Multimedia: A Case Study in Information Extraction https://t.co/3KQihA0q6V https://t.co/OdcDGisesi
Robots_and_AIs: RT @arxiv_org : Discourse in Multimedia: A Case Study in Information Extraction. https://t.co/3KQihA0q6V https://t.co/TgOiVuk5Pz https://t.co/vsVbyUpW9L
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 16166
Unqiue Words: 4384

2.053 Mikeys
#10. Few-shot Learning for Named Entity Recognition in Medical Text
Maximilian Hofer, Andrey Kormilitzin, Paul Goldberg, Alejo Nevado-Holgado
Deep neural network models have recently achieved state-of-the-art performance gains in a variety of natural language processing (NLP) tasks (Young, Hazarika, Poria, & Cambria, 2017). However, these gains rely on the availability of large amounts of annotated examples, without which state-of-the-art performance is rarely achievable. This is especially inconvenient for the many NLP fields where annotated examples are scarce, such as medical text. To improve NLP models in this situation, we evaluate five improvements on named entity recognition (NER) tasks when only ten annotated examples are available: (1) layer-wise initialization with pre-trained weights, (2) hyperparameter tuning, (3) combining pre-training data, (4) custom word embeddings, and (5) optimizing out-of-vocabulary (OOV) words. Experimental results show that the F1 score of 69.3% achievable by state-of-the-art models can be improved to 78.87%.
more | pdf | html
Figures
Tweets
arxivml: "Few-shot Learning for Named Entity Recognition in Medical Text", Maximilian Hofer, Andrey Kormilitzin, Paul Goldbe… https://t.co/BP7jegbU6v
nmfeeds: [CL] https://t.co/kIB5k8IEhe Few-shot Learning for Named Entity Recognition in Medical Text. Deep neural network models ha...
nmfeeds: [O] https://t.co/kIB5k8IEhe Few-shot Learning for Named Entity Recognition in Medical Text. Deep neural network models hav...
_makoh_: "Few-shot Learning for Named Entity Recognition in Medical Text. (arXiv:1811.05468v1 [https://t.co/Elc9rIUsHa])" https://t.co/Ag2Zo6CxqF #arxiv #feedly
StatsPapers: Few-shot Learning for Named Entity Recognition in Medical Text. https://t.co/uuqJtLaCr6
kormilitzin: Great work of @maximilianhofer on few-shot learning for medical texts: https://t.co/ts3CwvOT4B
arxiv_cscl: Few-shot Learning for Named Entity Recognition in Medical Text https://t.co/LNiQ0if8kv
arxiv_cscl: Few-shot Learning for Named Entity Recognition in Medical Text https://t.co/LNiQ0hXxsX
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4908
Unqiue Words: 1865

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.