Top 6 Arxiv Papers Today in Information Retrieval


2.038 Mikeys
#1. Distributed Vector Representations of Folksong Motifs
Aitor Arronte-Alvarez, Francisco Gómez-Martin
This article presents a distributed vector representation model for learning folksong motifs. A skip-gram version of word2vec with negative sampling is used to represent high quality embeddings. Motifs from the Essen Folksong collection are compared based on their cosine similarity. A new evaluation method for testing the quality of the embeddings based on a melodic similarity task is presented to show how the vector space can represent complex contextual features, and how it can be utilized for the study of folksong variation.
more | pdf | html
Figures
Tweets
arxivml: "Distributed Vector Representations of Folksong Motifs", Aitor Arronte-Alvarez, Francisco Gómez-Martin https://t.co/sfKH5Kzilp
arxiv_cs_LG: Distributed Vector Representations of Folksong Motifs. Aitor Arronte-Alvarez and Francisco Gómez-Martin https://t.co/vsc4tFIkD8
Memoirs: Distributed Vector Representations of Folksong Motifs. https://t.co/gkUzpQ07Mv
arxiv_cscl: Distributed Vector Representations of Folksong Motifs https://t.co/hAielISmM2
arxiv_cscl: Distributed Vector Representations of Folksong Motifs https://t.co/hAielISmM2
arxiv_cscl: Distributed Vector Representations of Folksong Motifs https://t.co/hAielISmM2
arxiv_cscl: Distributed Vector Representations of Folksong Motifs https://t.co/hAielIALns
arxiv_cscl: Distributed Vector Representations of Folksong Motifs https://t.co/hAielISmM2
insurrealist: RT @Memoirs: Distributed Vector Representations of Folksong Motifs. https://t.co/gkUzpQ07Mv
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 2886
Unqiue Words: 1157

2.007 Mikeys
#2. Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding
Christian J. Mahoney, Nathaniel Huber-Fliflet, Katie Jensen, Haozhen Zhao, Robert Neary, Shi Ye
Training documents have a significant impact on the performance of predictive models in the legal domain. Yet, there is limited research that explores the effectiveness of the training document selection strategy - in particular, the strategy used to select the seed set, or the set of documents an attorney reviews first to establish an initial model. Since there is limited research on this important component of predictive coding, the authors of this paper set out to identify strategies that consistently perform well. Our research demonstrated that the seed set selection strategy can have a significant impact on the precision of a predictive model. Enabling attorneys with the results of this study will allow them to initiate the most effective predictive modeling process to comb through the terabytes of data typically present in modern litigation. This study used documents from four actual legal cases to evaluate eight different seed set selection strategies. Attorneys can use the results contained within this paper to enhance...
more | pdf | html
Figures
Tweets
arxivml: "Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding", Christian J. Mahoney, Nathaniel Hub… https://t.co/ztBHUr8EK4
SciFi: Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding. https://t.co/12kzePOpSO
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 5168
Unqiue Words: 1498

2.004 Mikeys
#3. On Extracting Data from HTML Tables
Juan C. Roldán, Patricia Jiménez, Rafael Corchuelo
The Web provides many data in user-friendly tabular formats that are encoded using HTML. Information extractors are intended to extract those data as datasets that can feed business applications. There exist many proposals to implement them, which has motivated several previous surveys. Unfortunately, they are outdated and we do not think that it suffices to update them because they do not provide a good conceptual framework, they do not provide a taxonomy of web tables, they do not analyse the exact tasks involved, and they do not provide a good comparison framework. This article presents a review of the literature that does not have any of the previous problems, which we hope will be useful to both researchers and practitioners.
more | pdf | html
Figures
Tweets
arxiv_org: On Extracting Data from HTML Tables. https://t.co/dnrcY34L7P https://t.co/6gpyElVmb9
ComputerPapers: On Extracting Data from HTML Tables. https://t.co/FIpBR0qlos
BrianTRice: RT @arxiv_org: On Extracting Data from HTML Tables. https://t.co/dnrcY34L7P https://t.co/6gpyElVmb9
shubh_300595: RT @arxiv_org: On Extracting Data from HTML Tables. https://t.co/dnrcY34L7P https://t.co/6gpyElVmb9
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 14995
Unqiue Words: 3056

2.002 Mikeys
#4. Modelling Sequential Music Track Skips using a Multi-RNN Approach
Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma
Modelling sequential music skips provides streaming companies the ability to better understand the needs of the user base, resulting in a better user experience by reducing the need to manually skip certain music tracks. This paper describes the solution of the University of Copenhagen DIKU-IR team in the 'Spotify Sequential Skip Prediction Challenge', where the task was to predict the skip behaviour of the second half in a music listening session conditioned on the first half. We model this task using a Multi-RNN approach consisting of two distinct stacked recurrent neural networks, where one network focuses on encoding the first half of the session and the other network focuses on utilizing the encoding to make sequential skip predictions. The encoder network is initialized by a learned session-wide music encoding, and both of them utilize a learned track embedding. Our final model consists of a majority voted ensemble of individually trained models, and ranked 2nd out of 45 participating teams in the competition with a mean...
more | pdf | html
Figures
None.
Tweets
BrundageBot: Modelling Sequential Music Track Skips using a Multi-RNN Approach. Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, and Christina Lioma https://t.co/iuBcxUtclp
arxivml: "Modelling Sequential Music Track Skips using a Multi-RNN Approach", Christian Hansen, Casper Hansen, Stephen Alstr… https://t.co/XkmRdqhp3P
arxiv_cs_LG: Modelling Sequential Music Track Skips using a Multi-RNN Approach. Christian Hansen, Casper Hansen, Stephen Alstrup, Jakob Grue Simonsen, and Christina Lioma https://t.co/JoEryiVZ7p
Memoirs: Modelling Sequential Music Track Skips using a Multi-RNN Approach. https://t.co/zwSBLVnD80
Github
Repository: WSDM-challenge-2019-spotify
User: Varyn
Language: Python
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 3674
Unqiue Words: 1155

2.0 Mikeys
#5. A Graph-structured Dataset for Wikipedia Research
Nicolas Aspert, Volodymyr Miz, Benjamin Ricaud, Pierre Vandergheynst
Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we...
more | pdf | html
Figures
Tweets
WikiResearch: "A Graph-structured Dataset for Wikipedia Research", a convenient tool for researchers working on Wikipedia to rapidly access viewership statistics and subgraphs of Wikipedia articles and categories. (@naspert,@mizvladimir,@GBR_Data, @trekkinglemon 2019) https://t.co/N3TX3j0eF9 https://t.co/opQctRFJcZ
arxivml: "A Graph-structured Dataset for Wikipedia Research", Nicolas Aspert, Volodymyr Miz, Benjamin Ricaud, Pierre Vanderg… https://t.co/oFoeQRqpk2
arxiv_cs_LG: A Graph-structured Dataset for Wikipedia Research. Nicolas Aspert, Volodymyr Miz, Benjamin Ricaud, and Pierre Vandergheynst https://t.co/WSBI6JWET3
Memoirs: A Graph-structured Dataset for Wikipedia Research. https://t.co/crJ9N6ARuq
SRoyLee: A Graph-structured Dataset for Wikipedia Research - https://t.co/EAy0Btd24o
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5046
Unqiue Words: 1731

2.0 Mikeys
#6. Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking
Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, Christina Lioma
Automatic fact-checking systems detect misinformation, such as fake news, by (i) selecting check-worthy sentences for fact-checking, (ii) gathering related information to the sentences, and (iii) inferring the factuality of the sentences. Most prior research on (i) uses hand-crafted features to select check-worthy sentences, and does not explicitly account for the recent finding that the top weighted terms in both check-worthy and non-check-worthy sentences are actually overlapping [15]. Motivated by this, we present a neural check-worthiness sentence ranking model that represents each word in a sentence by \textit{both} its embedding (aiming to capture its semantics) and its syntactic dependencies (aiming to capture its role in modifying the semantics of other terms in the sentence). Our model is an end-to-end trainable neural network for check-worthiness ranking, which is trained on large amounts of unlabelled data through weak supervision. Thorough experimental evaluation against state of the art baselines, with and without...
more | pdf | html
Figures
Tweets
arxivml: "Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking", Casper Hansen, Christ… https://t.co/QWkOvFPNtw
arxiv_cs_LG: Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking. Casper Hansen, Christian Hansen, Stephen Alstrup, Jakob Grue Simonsen, and Christina Lioma https://t.co/zb18gmEkhz
Memoirs: Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking. https://t.co/aeU9Fen56T
arxiv_cscl: Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking https://t.co/I59a1gr3G9
arxiv_cscl: Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking https://t.co/I59a1gr3G9
arxiv_cscl: Neural Check-Worthiness Ranking with Weak Supervision: Finding Sentences for Fact-Checking https://t.co/I59a1gr3G9
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 5671
Unqiue Words: 1952

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 99,586 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 99,586 papers.