### Top 10 Arxiv Papers Today in Information Retrieval

##### #1. Latent Dirichlet Allocation (LDA) for Topic Modeling of the CFPB Consumer Complaints
###### Kaveh Bastani, Hamed Namavari, Jeffry Shaffer
A text mining approach is proposed based on latent Dirichlet allocation (LDA) to analyze the Consumer Financial Protection Bureau (CFPB) consumer complaints. The proposed approach aims to extract latent topics in the CFPB complaint narratives, and explores their associated trends over time. The time trends will then be used to evaluate the effectiveness of the CFPB regulations and expectations on financial institutions in creating a consumer oriented culture that treats consumers fairly and prioritizes consumer protection in their decision making processes. The proposed approach can be easily operationalized as a decision support system to automate detection of emerging topics in consumer complaints. Hence, the technology-human partnership between the proposed approach and the CFPB team could certainly improve consumer protections from unfair, deceptive or abusive practices in the financial markets by providing more efficient and effective investigations of consumer complaint narratives.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10564
Unqiue Words: 2898

##### #2. RARD II: The 2nd Related-Article Recommendation Dataset
###### Joeran Beel, Barry Smyth, Andrew Collins
The main contribution of this paper is to introduce and describe a new recommender-systems dataset (RARD II). It is based on data from a recommender-system in the digital library and reference management software domain. As such, it complements datasets from other domains such as books, movies, and music. The RARD II dataset encompasses 89m recommendations, covering an item-space of 24m unique items. RARD II provides a range of rich recommendation data, beyond conventional ratings. For example, in addition to the usual ratings matrices, RARD II includes the original recommendation logs, which provide a unique insight into many aspects of the algorithms that generated the recommendations. In this paper, we summarise the key features of this dataset release, describing how it was generated and discussing some of its unique features.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4746
Unqiue Words: 1900

##### #3. A Collective Variational Autoencoder for Top-$N$ Recommendation with Side Information
###### Yifan Chen, Maarten de Rijke
Recommender systems have been studied extensively due to their practical use in many real-world scenarios. Despite this, generating effective recommendations with sparse user ratings remains a challenge. Side information associated with items has been widely utilized to address rating sparsity. Existing recommendation models that use side information are linear and, hence, have restricted expressiveness. Deep learning has been used to capture non-linearities by learning deep item representations from side information but as side information is high-dimensional existing deep models tend to have large input dimensionality, which dominates their overall size. This makes them difficult to train, especially with small numbers of inputs. Rather than learning item representations, which is problematic with high-dimensional side information, in this paper, we propose to learn feature representation through deep learning from side information. Learning feature representations, on the other hand, ensures a sufficient number of inputs...
more | pdf | html
None.
###### Tweets
tmasada: Yifan Chen and Maarten de Rijke. A collective variational autoencoder for top-N recommendation with side information. In 3rd Workshop on Deep Learning for Recommender Systems. ACM, October 2018. https://t.co/dbDh4PrsJ3
ComputerPapers: A Collective Variational Autoencoder for Top-$N$ Recommendation with Side Information. https://t.co/7e5cIoF9fr
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5830
Unqiue Words: 1701

##### #4. Machine Learning Approaches to Hybrid Music Recommender Systems
###### Andreu Vall, Gerhard Widmer
Music recommender systems have become a key technology supporting the access to increasingly larger music catalogs in on-line music streaming services, on-line music shops, and private collections. The interaction of users with large music catalogs is a complex phenomenon researched from different disciplines. We survey our works investigating the machine learning and data mining aspects of hybrid music recommender systems (i.e., systems that integrate different recommendation techniques). We proposed hybrid music recommender systems based solely on data and robust to the so-called "cold-start problem" for new music items, favoring the discovery of relevant but non-popular music. We thoroughly studied the specific task of music playlist continuation, by analyzing fundamental playlist characteristics, song feature representations, and the relationship between playlists and the songs therein.
more | pdf | html
None.
###### Tweets
nschaetti: [1807.05858] #MachineLearning Approaches to Hybrid Music Recommender Systems #AI #IA #BigData #DataScience #artificialintelligence #computerintelligence https://t.co/3dIJkJqKc7
AndreuVall: On Thursday I'll present our research on machine learning approaches to hybrid music recommender systems at the Nectar track of #ECMLPKDD2018. Looking forward to it! #recsys @ECMLPKDD https://t.co/jozxbbBUS8
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 1586
Unqiue Words: 722

##### #5. An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation
###### Hamed Zamani, Markus Schedl, Paul Lamere, Ching-Wei Chen
The ACM Recommender Systems Challenge 2018 focused on the task of automatic music playlist continuation, which is a form of the more general task of sequential recommendation. Given a playlist of arbitrary length with some additional meta-data, the task was to recommend up to 500 tracks that fit the target characteristics of the original playlist. For the RecSys Challenge, Spotify released a dataset of one million user-generated playlists. Participants could compete in two tracks, i.e., main and creative tracks. Participants in the main track were only allowed to use the provided training set, however, in the creative track, the use of external public sources was permitted. In total, 113 teams submitted 1,228 runs to the main track; 33 teams submitted 239 runs to the creative track. The highest performing team in the main track achieved an R-precision of 0.2241, an NDCG of 0.3946, and an average number of recommended songs clicks of 1.784. In the creative track, an R-precision of 0.2233, an NDCG of 0.3939, and a click rate of...
more | pdf | html
None.
###### Tweets
arxiv_org: An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Play... https://t.co/6UxHBfQcOr https://t.co/FzkzjV3JwX
arxivml: "An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation", Hamed… https://t.co/d79CwOIxiC
Memoirs: An Analysis of Approaches Taken in the ACM RecSys Challenge 2018 for Automatic Music Playlist Continuation. https://t.co/gmwkXhTpUQ
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 9881
Unqiue Words: 3247

##### #6. A Distributed and Accountable Approach to Offline Recommender Systems Evaluation
###### Diego Monti, Giuseppe Rizzo, Maurizio Morisio
Different software tools have been developed with the purpose of performing offline evaluations of recommender systems. However, the results obtained with these tools may be not directly comparable because of subtle differences in the experimental protocols and metrics. Furthermore, it is difficult to analyze in the same experimental conditions several algorithms without disclosing their implementation details. For these reasons, we introduce RecLab, an open source software for evaluating recommender systems in a distributed fashion. By relying on consolidated web protocols, we created RESTful APIs for training and querying recommenders remotely. In this way, it is possible to easily integrate into the same toolkit algorithms realized with different technologies. In details, the experimenter can perform an evaluation by simply visiting a web interface provided by RecLab. The framework will then interact with all the selected recommenders and it will compute and display a comprehensive set of measures, each representing a different...
more | pdf | html
None.
###### Github

REST-based offline evaluation framework for recommender systems

Repository: reclab
User: D2KLab
Language: Python
Stargazers: 1
Subscribers: 3
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4494
Unqiue Words: 1574

##### #7. Counterfactual Learning-to-Rank for Additive Metrics and Deep Models
###### Aman Agarwal, Ivan Zaitsev, Thorsten Joachims
Implicit feedback (e.g., clicks, dwell times) is an attractive source of training data for Learning-to-Rank, but it inevitably suffers from biases such as position bias. It was recently shown how counterfactual inference techniques can provide a rigorous approach for handling these biases, but existing methods are restricted to the special case of optimizing average rank for linear ranking functions. In this work, we generalize the counterfactual learning-to-rank approach to a broad class of additive rank metrics -- like Discounted Cumulative Gain (DCG) and Precision@k -- as well as non-linear deep network models. Focusing on DCG, this conceptual generalization gives rise to two new learning methods that both directly optimize an unbiased estimate of DCG despite the bias in the implicit feedback data. The first, SVM PropDCG, generalizes the Propensity Ranking SVM (SVM PropRank), and we show how the resulting optimization problem can be addressed via the Convex Concave Procedure (CCP). The second, Deep PropDCG, further generalizes...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8297
Unqiue Words: 2242

##### #8. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets
###### Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, Stefan Dietze
Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, spanning almost 5 years (Jan'13-Nov'17). Metadata information about the tweets as well as extracted entities, hashtags, user mentions and sentiment information are exposed using established RDF/S vocabularies. Next to a description of the extraction and annotation process, we present use cases to illustrate scenarios for entity-centric information exploration, data integration and knowledge discovery facilitated by TweetsKB.
more | pdf | html
###### Tweets
ComputerPapers: TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets. https://t.co/DShee11ADM
###### Github
Repository: AnnotatedTweets2RDF
User: iosifidisvasileios
Language: Java
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5169
Unqiue Words: 2077

##### #9. Textually Guided Ranking Network for Attentional Image Retweet Modeling
###### Zhou Zhao, Hanbing Zhan, Lingtao Meng, Jun Xiao, Jun Yu, Min Yang, Fei Wu, Deng Cai
Retweet prediction is a challenging problem in social media sites (SMS). In this paper, we study the problem of image retweet prediction in social media, which predicts the image sharing behavior that the user reposts the image tweets from their followees. Unlike previous studies, we learn user preference ranking model from their past retweeted image tweets in SMS. We first propose heterogeneous image retweet modeling network (IRM) that exploits users' past retweeted image tweets with associated contexts, their following relations in SMS and preference of their followees. We then develop a novel attentional multi-faceted ranking network learning framework with textually guided multi-modal neural networks for the proposed heterogenous IRM network to learn the joint image tweet representations and user preference representations for prediction task. The extensive experiments on a large-scale dataset from Twitter site shows that our method achieves better performance than other state-of-the-art solutions to the problem.
more | pdf | html
###### Tweets
arxivml: "Textually Guided Ranking Network for Attentional Image Retweet Modeling", Zhou Zhao, Hanbing Zhan, Lingtao Meng, J… https://t.co/XbdJTfao63
nmfeeds: [O] https://t.co/NJQo2JSj41 Textually Guided Ranking Network for Attentional Image Retweet Modeling. Retweet prediction is...
Memoirs: Textually Guided Ranking Network for Attentional Image Retweet Modeling. https://t.co/wOS5eFpJJu
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 8
Total Words: 9224
Unqiue Words: 2441

##### #10. STTM: A Tool for Short Text Topic Modeling
###### Jipeng Qiang, Yun Li, Yunhao Yuan, Wei Liu, Xindong Wu
Along with the emergence and popularity of social communications on the Internet, topic discovery from short texts becomes fundamental to many applications that require semantic understanding of textual content. As a rising research field, short text topic modeling presents a new and complementary algorithmic methodology to supplement regular text topic modeling, especially targets to limited word co-occurrence information in short texts. This paper presents the first comprehensive open-source package, called STTM, for use in Java that integrates the state-of-the-art models of short text topic modeling algorithms, benchmark datasets, and abundant functions for model inference and evaluation. The package is designed to facilitate the expansion of new methods in this research field and make evaluations between the new approaches and existing ones accessible. STTM is open-sourced at https://github.com/qiang2100/STTM.
more | pdf | html
Repository: STTM
User: qiang2100
Language: Java
Stargazers: 1
Subscribers: 1
Forks: 0
Open Issues: 1
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 1949
Unqiue Words: 809

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,893 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 72,893 papers.