Top 10 Arxiv Papers Today in Information Retrieval


2.078 Mikeys
#1. Boosting Search Performance Using Query Variations
Rodger Benham, Joel Mackenzie, Alistair Moffat, J. Shane Culpepper
Rank fusion is a powerful technique that allows multiple sources of information to be combined into a single result set. However, to date fusion has not been regarded as being cost-effective in cases where strict per-query efficiency guarantees are required, such as in web search. In this work we propose a novel solution to rank fusion by splitting the computation into two parts -- one phase that is carried out offline to generate pre-computed centroid answers for queries with broadly similar information needs, and then a second online phase that uses the corresponding topic centroid to compute a result page for each query. We explore efficiency improvements to classic fusion algorithms whose costs can be amortized as a pre-processing step, and can then be combined with re-ranking approaches to dramatically improve effectiveness in multi-stage retrieval systems with little efficiency overhead at query time. Experimental results using the ClueWeb12B collection and the UQV100 query variations demonstrate that centroid-based...
more | pdf | html
Figures
None.
Tweets
Github
Repository: Variable-BMW
User: rossanoventurini
Language: C++
Stargazers: 5
Subscribers: 4
Forks: 3
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 12664
Unqiue Words: 3436

2.037 Mikeys
#2. ML-Net: multi-label classification of biomedical texts with deep neural networks
Jingcheng Du, Qingyu Chen, Yifan Peng, Yang Xiang, Cui Tao, Zhiyong Lu
Background: Multi-label text classification is one type of text classification where each text can be assigned with one or more labels. Multi-label text classification, which has broad applications in biomedical domain, is often considered harder than other types of text classification, as each textual document can be assigned with indeterminate number of labels. Methods: In this work, we propose ML-Net, a novel end-to-end deep learning framework, for multi-label classification of biomedical tasks. ML-Net combines the label prediction network with a label count prediction network, which can determine the output labels based on both label confidence scores and document context in an end-to-end manner. We evaluated the ML-Net on publicly available multi-label biomedical text classification tasks from both biomedical literature domain and clinical domain. Example-based metrics including precision, recall and f-measure were calculated. We compared the ML-NET with both traditional machine learning baseline models as well as classic...
more | pdf | html
Figures
Tweets
arxivml: "ML-Net: multi-label classification of biomedical texts with deep neural networks", Jingcheng Du, Qingyu Chen, Yifa… https://t.co/6FtOD7zOkH
nmfeeds: [CL] https://t.co/lPwoJZB7fc ML-Net: multi-label classification of biomedical texts with deep neural networks. Background:...
nmfeeds: [O] https://t.co/lPwoJZB7fc ML-Net: multi-label classification of biomedical texts with deep neural networks. Background: ...
StatsPapers: ML-Net: multi-label classification of biomedical texts with deep neural networks. https://t.co/pyvamrvFNj
arxiv_cscl: ML-Net: multi-label classification of biomedical texts with deep neural networks https://t.co/8k5jUfO6p3
arxiv_cscl: ML-Net: multi-label classification of biomedical texts with deep neural networks https://t.co/8k5jUfO6p3
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 4825
Unqiue Words: 1692

2.03 Mikeys
#3. Automatic event detection in microblogs using incremental machine learning
Tharindu Rukshan Bandaragoda, Daswin De Silva, Damminda Alahakoon
The global popularity of microblogs has led to an increasing accumulation of large volumes of text data on microblogging platforms such as Twitter. These corpora are untapped resources to understand social expressions on diverse subjects. Microblog analysis aims to unlock the value of such expressions by discovering insights and events of significance hidden among swathes of text. Besides velocity; diversity of content, brevity, absence of structure and time-sensitivity are key challenges in microblog analysis. In this paper, we propose an unsupervised incremental machine learning and event detection technique to address these challenges. The proposed technique separates a microblog discussion into topics to address the key problem of diversity. It maintains a record of the evolution of each topic over time. Brevity, time-sensitivity and unstructured nature are addressed by these individual topic pathways which contribute to generate a temporal, topic-driven structure of a microblog discussion. The proposed event detection method...
more | pdf | html
Figures
Tweets
arxivml: "Automatic event detection in microblogs using incremental machine learning", Tharindu Rukshan Bandaragoda, Daswin … https://t.co/ikGWYG7Gw4
nmfeeds: [CL] https://t.co/4Sr8fzVaTV Automatic event detection in microblogs using incremental machine learning. The global popula...
nmfeeds: [O] https://t.co/4Sr8fzVaTV Automatic event detection in microblogs using incremental machine learning. The global popular...
arxiv_cscl: Automatic event detection in microblogs using incremental machine learning https://t.co/Sw51mEta23
ComputerPapers: Automatic event detection in microblogs using incremental machine learning. https://t.co/K5hAMF8zTe
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 9568
Unqiue Words: 2541

0.0 Mikeys
#4. An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval
Ayyoob Imani, Amir Vakili, Ali Montazer, Azadeh Shakery
Classic retrieval methods use simple bag-of-word representations for queries and documents. This representation fails to capture the full semantic richness of queries and documents. More recent retrieval models have tried to overcome this deficiency by using approaches such as incorporating dependencies between query terms, using bi-gram representations of documents, proximity heuristics, and passage retrieval. While some of these previous works have implicitly accounted for term order, to the best of our knowledge, term order has not been the primary focus of any research. In this paper, we focus solely on the effect of term order in information retrieval. We will show that documents that have two query terms in the same order as in the query have a higher probability of being relevant than documents that have two query terms in the reverse order. Using the axiomatic framework for information retrieval, we introduce a constraint that retrieval models must adhere to in order to effectively utilize term order dependency among query...
more | pdf | html
Figures
None.
Tweets
ComputerPapers: An Axiomatic Study of Query Terms Order in Ad-hoc Retrieval. https://t.co/rU6Gp4DLyp
Github
None.
Youtube
None.
Other stats
Sample Sizes : [8]
Authors: 4
Total Words: 3456
Unqiue Words: 1070

0.0 Mikeys
#5. Privacy-Adversarial User Representations in Recommender Systems
Yehezkel S. Resheff, Yanai Elazar, Moni Shahar, Oren Sar Shalom
Latent factor models for recommender systems represent users and items as low dimensional vectors. Privacy risks have been previously studied mostly in the context of recovery of personal information in the form of usage records from the training data. However, the user representations themselves may be used together with external data to recover private user information such as gender and age. In this paper we show that user vectors calculated by a common recommender system can be exploited in this way. We propose the privacy-adversarial framework to eliminate such leakage, and study the trade-off between recommender performance and leakage both theoretically and empirically using a benchmark dataset. We briefly discuss further applications of this method towards the generation of deeper and more insightful recommendations.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Privacy-Adversarial User Representations in Recommender Systems. https://t.co/KCYor1mlKx
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4400
Unqiue Words: 1649

0.0 Mikeys
#6. Limits to Surprise in Recommender Systems
Andre Paulino de Lima, Sarajane Marques Peres
In this study, we address the challenge of measuring the ability of a recommender system to make surprising recommendations. Although current evaluation methods make it possible to determine if two algorithms can make recommendations with a significant difference in their average surprise measure, it could be of interest to our community to know how competent an algorithm is at embedding surprise in its recommendations, without having to resort to making a direct comparison with another algorithm. We argue that a) surprise is a finite resource in a recommender system, b) there is a limit to how much surprise any algorithm can embed in a recommendation, and c) this limit can provide us with a scale against which the performance of any algorithm can be measured. By exploring these ideas, it is possible to define the concepts of maximum and minimum potential surprise and design a surprise metric called "normalised surprise" that employs these limits to potential surprise. Two experiments were conducted to test the proposed metric....
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 9374
Unqiue Words: 2597

0.0 Mikeys
#7. A Hybrid Variational Autoencoder for Collaborative Filtering
Kilol Gupta, Mukund Yelahanka Raghuprasad, Pankhuri Kumar
In today's day and age when almost every industry has an online presence with users interacting in online marketplaces, personalized recommendations have become quite important. Traditionally, the problem of collaborative filtering has been tackled using Matrix Factorization which is linear in nature. We extend the work of [11] on using variational autoencoders (VAEs) for collaborative filtering with implicit feedback by proposing a hybrid, multi-modal approach. Our approach combines movie embeddings (learned from a sibling VAE network) with user ratings from the Movielens 20M dataset and applies it to the task of movie recommendation. We empirically show how the VAE network is empowered by incorporating movie embeddings. We also visualize movie and user embeddings by clustering their latent representations obtained from a VAE.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4115
Unqiue Words: 1595

0.0 Mikeys
#8. Recurrent Neural Networks for Long and Short-Term Sequential Recommendation
Kiewan Villatel, Elena Smirnova, Jérémie Mary, Philippe Preux
Recommender systems objectives can be broadly characterized as modeling user preferences over short-or long-term time horizon. A large body of previous research studied long-term recommendation through dimensionality reduction techniques applied to the historical user-item interactions. A recently introduced session-based recommendation setting highlighted the importance of modeling short-term user preferences. In this task, Recurrent Neural Networks (RNN) have shown to be successful at capturing the nuances of user's interactions within a short time window. In this paper, we evaluate RNN-based models on both short-term and long-term recommendation tasks. Our experimental results suggest that RNNs are capable of predicting immediate as well as distant user interactions. We also find the best performing configuration to be a stacked RNN with layer normalization and tied item embeddings.
more | pdf | html
Figures
Tweets
nmfeeds: [O] https://t.co/koWg5F1uDk Recurrent Neural Networks for Long and Short-Term Sequential Recommendation. Recommender syste...
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5749
Unqiue Words: 1855

0.0 Mikeys
#9. A Line in the Sand: Recommendation or Ad-hoc Retrieval?
Surya Kallumadi, Bhaskar Mitra, Tereza Iofciu
The popular approaches to recommendation and ad-hoc retrieval tasks are largely distinct in the literature. In this work, we argue that many recommendation problems can also be cast as ad-hoc retrieval tasks. To demonstrate this, we build a solution for the RecSys 2018 Spotify challenge by combining standard ad-hoc retrieval models and using popular retrieval tools sets. We draw a parallel between the playlist continuation task and the task of finding good expansion terms for queries in ad-hoc retrieval, and show that standard pseudo-relevance feedback can be effective as a collaborative filtering approach. We also use ad-hoc retrieval for content-based recommendation by treating the input playlist title as a query and associating all candidate tracks with meta-descriptions extracted from the background data. The recommendations from these two approaches are further supplemented by a nearest neighbor search based on track embeddings learned by a popular neural model. Our final ranked list of recommendations is produced by a...
more | pdf | html
Figures
None.
Tweets
UnderdogGeek: Our paper "A Line in the Sand: Recommendation or Ad-hoc Retrieval?" \w @kallumadi & @terezaif has been accepted for presentation at #RecsysChallenge2018 workshop. Pre-print: https://t.co/ihRG85bCkF #recsys2018 https://t.co/m2EQ6xZcm8
humphreysheil: Conventional IR pipeline (query expansion, BM25, Learning to Rank) used in the Spotify recommendation challenge by @kallumadi et al (Team BachPropagate, 7th place): https://t.co/G6Fg66LlaC https://t.co/paGB5ZQuyO
AlvaroBarreiroG: Glad to know that our modelling of recommendation as a query expansion problem was succesfully used in the Spotify RecSys challenge by @kallumadi @UnderdogGeek and @terezaif https://t.co/2ROR9FcyI6
rfushimi222: RT @HubBucket: A Line in the Sand: Recommendation or Ad-hoc Information Retrieval ✨https://t.co/gRillnVaLL @HubBucket, @HubDataScience, @…
HubDataScience: RT @HubBucket: A Line in the Sand: Recommendation or Ad-hoc Information Retrieval ✨https://t.co/gRillnVaLL @HubBucket, @HubDataScience, @…
Github

Scripts/ code for Recsys Challenge 2018

Repository: BachPropagate
User: skallumadi
Language: Jupyter Notebook
Stargazers: 2
Subscribers: 3
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 5489
Unqiue Words: 1866

0.0 Mikeys
#10. SpeedReader: Reader Mode Made Fast and Private
Mohammad Ghasemisharif, Peter Snyder, Andrius Aucinas, Benjamin Livshits
Most popular web browsers include "reader modes" that improve the user experience by removing un-useful page elements. Reader modes reformat the page to hide elements that are not related to the page's main content. Such page elements include site navigation, advertising related videos and images, and most JavaScript. The intended end result is that users can enjoy the content they are interested in, without distraction. In this work, we consider whether the "reader mode" can be widened to also provide performance and privacy improvements. Instead of its use as a post-render feature to clean up the clutter on a page we propose SpeedReader as an alternative multistep pipeline that is part of the rendering pipeline. Once the tool decides during the initial phase of a page load that a page is suitable for reader mode use, it directly applies document tree translation before the page is rendered. Based on our measurements, we believe that SpeedReader can be continuously enabled in order to drastically improve end-user experience,...
more | pdf | html
Figures
Tweets
arxiv_org: SpeedReader: Reader Mode Made Fast and Private. https://t.co/bRehh0Dbg7 https://t.co/VWUIYp3sTI
Rosenchild: RT @arxiv_org: SpeedReader: Reader Mode Made Fast and Private. https://t.co/bRehh0Dbg7 https://t.co/VWUIYp3sTI
Github
Repository: SpeedReader
User: x4dx65
Language: Python
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 10375
Unqiue Words: 3140

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.