Top 10 Arxiv Papers Today in Information Retrieval


0.0 Mikeys
#1. Latent Dirichlet Allocation (LDA) for Topic Modeling of the CFPB Consumer Complaints
Kaveh Bastani, Hamed Namavari, Jeffry Shaffer
A text mining approach is proposed based on latent Dirichlet allocation (LDA) to analyze the Consumer Financial Protection Bureau (CFPB) consumer complaints. The proposed approach aims to extract latent topics in the CFPB complaint narratives, and explores their associated trends over time. The time trends will then be used to evaluate the effectiveness of the CFPB regulations and expectations on financial institutions in creating a consumer oriented culture that treats consumers fairly and prioritizes consumer protection in their decision making processes. The proposed approach can be easily operationalized as a decision support system to automate detection of emerging topics in consumer complaints. Hence, the technology-human partnership between the proposed approach and the CFPB team could certainly improve consumer protections from unfair, deceptive or abusive practices in the financial markets by providing more efficient and effective investigations of consumer complaint narratives.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10564
Unqiue Words: 2898

0.0 Mikeys
#2. RARD II: The 2nd Related-Article Recommendation Dataset
Joeran Beel, Barry Smyth, Andrew Collins
The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from a recommender-system in the digital library and reference management software domain. As such, it complements datasets from other domains such as books, movies, and music. The RARD II dataset encompasses 89m recommendations, covering an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual ratings matrices, RARD II includes the original recommendation logs, which provide a unique insight into many aspects of the algorithms that generated the recommendations. In this paper, we summarise the key features of this dataset release, describing how it was generated and discussing some of its unique features.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4746
Unqiue Words: 1900

0.0 Mikeys
#3. A Collective Variational Autoencoder for Top-$N$ Recommendation with Side Information
Yifan Chen, Maarten de Rijke
Recommender systems have been studied extensively due to their practical use in many real-world scenarios. Despite this, generating effective recommendations with sparse user ratings remains a challenge. Side information associated with items has been widely utilized to address rating sparsity. Existing recommendation models that use side information are linear and, hence, have restricted expressiveness. Deep learning has been used to capture non-linearities by learning deep item representations from side information but as side information is high-dimensional existing deep models tend to have large input dimensionality, which dominates their overall size. This makes them difficult to train, especially with small numbers of inputs. Rather than learning item representations, which is problematic with high-dimensional side information, in this paper, we propose to learn feature representation through deep learning from side information. Learning feature representations, on the other hand, ensures a sufficient number of inputs...
more | pdf | html
Figures
None.
Tweets
tmasada: Yifan Chen and Maarten de Rijke. A collective variational autoencoder for top-N recommendation with side information. In 3rd Workshop on Deep Learning for Recommender Systems. ACM, October 2018. https://t.co/dbDh4PrsJ3
ComputerPapers: A Collective Variational Autoencoder for Top-$N$ Recommendation with Side Information. https://t.co/7e5cIoF9fr
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5830
Unqiue Words: 1701

0.0 Mikeys
#4. Machine Learning Approaches to Hybrid Music Recommender Systems
Andreu Vall, Gerhard Widmer
Music recommender systems have become a key technology supporting the access to increasingly larger music catalogs in on-line music streaming services, on-line music shops, and private collections. The interaction of users with large music catalogs is a complex phenomenon researched from different disciplines. We survey our works investigating the machine learning and data mining aspects of hybrid music recommender systems (i.e., systems that integrate different recommendation techniques). We proposed hybrid music recommender systems based solely on data and robust to the so-called "cold-start problem" for new music items, favoring the discovery of relevant but non-popular music. We thoroughly studied the specific task of music playlist continuation, by analyzing fundamental playlist characteristics, song feature representations, and the relationship between playlists and the songs therein.
more | pdf | html
Figures
None.
Tweets
nschaetti: [1807.05858] #MachineLearning Approaches to Hybrid Music Recommender Systems #AI #IA #BigData #DataScience #artificialintelligence #computerintelligence https://t.co/3dIJkJqKc7
AndreuVall: On Thursday I'll present our research on machine learning approaches to hybrid music recommender systems at the Nectar track of #ECMLPKDD2018. Looking forward to it! #recsys @ECMLPKDD https://t.co/jozxbbBUS8
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 1586
Unqiue Words: 722

0.0 Mikeys
#5. An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation
Hamed Zamani, Markus Schedl, Paul Lamere, Ching-Wei Chen
The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of...
more | pdf | html
Figures
None.
Tweets
arxiv_org: An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Play... https://t.co/6UxHBfQcOr https://t.co/FzkzjV3JwX
arxivml: "An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation", Hamed… https://t.co/d79CwOIxiC
Memoirs: An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. https://t.co/gmwkXhTpUQ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 9881
Unqiue Words: 3247

0.0 Mikeys
#6. A Distributed and Accountable Approach to Offline Recommender Systems Evaluation
Diego Monti, Giuseppe Rizzo, Maurizio Morisio
Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different...
more | pdf | html
Figures
None.
Tweets
Github

REST-based offline evaluation framework for recommender systems

Repository: reclab
User: D2KLab
Language: Python
Stargazers: 1
Subscribers: 3
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4494
Unqiue Words: 1574

0.0 Mikeys
#7. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models
Aman Agarwal, Ivan Zaitsev, Thorsten Joachims
Implicit feedback (e.g., clicks, dwell times) is an attractive source of training data for Learning-to-Rank, but it inevitably suffers from biases such as position bias. It was recently shown how counterfactual inference techniques can provide a rigorous approach for handling these biases, but existing methods are restricted to the special case of optimizing average rank for linear ranking functions. In this work, we generalize the counterfactual learning-to-rank approach to a broad class of additive rank metrics -- like Discounted Cumulative Gain (DCG) and Precision@k -- as well as non-linear deep network models. Focusing on DCG, this conceptual generalization gives rise to two new learning methods that both directly optimize an unbiased estimate of DCG despite the bias in the implicit feedback data. The first, SVM PropDCG, generalizes the Propensity Ranking SVM (SVM PropRank), and we show how the resulting optimization problem can be addressed via the Convex Concave Procedure (CCP). The second, Deep PropDCG, further generalizes...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8297
Unqiue Words: 2242

0.0 Mikeys
#8. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets
Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, Stefan Dietze
Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan'13-Nov'17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.
more | pdf | html
Figures
Tweets
ComputerPapers: TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. https://t.co/DShee11ADM
Github
Repository: AnnotatedTweets2RDF
User: iosifidisvasileios
Language: Java
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5169
Unqiue Words: 2077

0.0 Mikeys
#9. Textually Guided Ranking Network for Attentional Image Retweet Modeling
Zhou Zhao, Hanbing Zhan, Lingtao Meng, Jun Xiao, Jun Yu, Min Yang, Fei Wu, Deng Cai
Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with textually guided multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.
more | pdf | html
Figures
Tweets
arxivml: "Textually Guided Ranking Network for Attentional Image Retweet Modeling", Zhou Zhao, Hanbing Zhan, Lingtao Meng, J… https://t.co/XbdJTfao63
nmfeeds: [O] https://t.co/NJQo2JSj41 Textually Guided Ranking Network for Attentional Image Retweet Modeling. Retweet prediction is...
Memoirs: Textually Guided Ranking Network for Attentional Image Retweet Modeling. https://t.co/wOS5eFpJJu
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 8
Total Words: 9224
Unqiue Words: 2441

0.0 Mikeys
#10. STTM: A Tool for Short Text Topic Modeling
Jipeng Qiang, Yun Li, Yunhao Yuan, Wei Liu, Xindong Wu
Along with the emergence and popularity of social communications on the Internet, topic discovery from short texts becomes fundamental to many applications that require semantic understanding of textual content. As a rising research field, short text topic modeling presents a new and complementary algorithmic methodology to supplement regular text topic modeling, especially targets to limited word co-occurrence information in short texts. This paper presents the first comprehensive open-source package, called STTM, for use in Java that integrates the state-of-the-art models of short text topic modeling algorithms, benchmark datasets, and abundant functions for model inference and evaluation. The package is designed to facilitate the expansion of new methods in this research field and make evaluations between the new approaches and existing ones accessible. STTM is open-sourced at https://github.com/qiang2100/STTM.
more | pdf | html
Figures
Tweets
Github
Repository: STTM
User: qiang2100
Language: Java
Stargazers: 1
Subscribers: 1
Forks: 0
Open Issues: 1
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 1949
Unqiue Words: 809

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,893 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 72,893 papers.