Top 7 Arxiv Papers Today in Information Retrieval


2.049 Mikeys
#1. Decoding the Style and Bias of Song Lyrics
Manash Pratim Barman, Amit Awekar, Sambhav Kothari
The central idea of this paper is to gain a deeper understanding of song lyrics computationally. We focus on two aspects: style and biases of song lyrics. All prior works to understand these two aspects are limited to manual analysis of a small corpus of song lyrics. In contrast, we analyzed more than half a million songs spread over five decades. We characterize the lyrics style in terms of vocabulary, length, repetitiveness, speed, and readability. We have observed that the style of popular songs significantly differs from other songs. We have used distributed representation methods and WEAT test to measure various gender and racial biases in the song lyrics. We have observed that biases in song lyrics correlate with prior results on human subjects. This correlation indicates that song lyrics reflect the biases that exist in society. Increasing consumption of music and the effect of lyrics on human emotions makes this analysis important.
more | pdf | html
Figures
None.
Tweets
BrundageBot: Decoding the Style and Bias of Song Lyrics. Manash Pratim Barman, Amit Awekar, and Sambhav Kothari https://t.co/SHogxJlFfS
arxivml: "Decoding the Style and Bias of Song Lyrics", Manash Pratim Barman, Amit Awekar, Sambhav Kothari https://t.co/oAgLhK0XGa
arxiv_cscl: Decoding the Style and Bias of Song Lyrics https://t.co/GWKpIbGvzg
arxiv_cscl: Decoding the Style and Bias of Song Lyrics https://t.co/GWKpIbGvzg
AssistedEvolve: RT @BrundageBot: Decoding the Style and Bias of Song Lyrics. Manash Pratim Barman, Amit Awekar, and Sambhav Kothari https://t.co/SHogxJlFfS
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.021 Mikeys
#2. On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems
Gabriel de Souza P. Moreira, Dietmar Jannach, Adilson Marques da Cunha
News recommender systems are designed to surface relevant information for online readers by personalizing their user experiences. A particular problem in that context is that online readers are often anonymous, which means that this personalization can only be based on the last few recorded interactions with the user, a setting named session-based recommendation. Another particularity of the news domain is that constantly fresh articles are published, which might be immediately considered for recommendation. To deal with such item cold-start problem, it is important to consider the actual content of items when recommending. Hybrid approaches are therefore often considered as the method of choice in such settings. In this work, we analyze the importance of considering content information in a hybrid neural news recommender system. We contrast content-aware and content-agnostic techniques and also explore the effects of using different content encodings. Experiments on two public datasets confirm the importance of adopting a hybrid...
more | pdf | html
Figures
Tweets
arxivml: "On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems", Gabriel de S… https://t.co/w6Ou50k41Q
Memoirs: On the Importance of News Content Representation in Hybrid Neural Session-based Recommender Systems. https://t.co/lkxkfL9hI3
Github

Source code of CHAMELEON - A Deep Learning Meta-Architecture for News Recommender Systems

Repository: chameleon_recsys
User: gabrielspmoreira
Language: Python
Stargazers: 58
Subscribers: 15
Forks: 24
Open Issues: 2
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4586
Unqiue Words: 1895

2.018 Mikeys
#3. Flatter is better: Percentile Transformations for Recommender Systems
Masoud Mansoury, Robin Burke, Bamshad Mobasher
It is well known that explicit user ratings in recommender systems are biased towards high ratings, and that users differ significantly in their usage of the rating scale. Implementers usually compensate for these issues through rating normalization or the inclusion of a user bias term in factorization models. However, these methods adjust only for the central tendency of users' distributions. In this work, we demonstrate that lack of \textit{flatness} in rating distributions is negatively correlated with recommendation performance. We propose a rating transformation model that compensates for skew in the rating distribution as well as its central tendency by converting ratings into percentile values as a pre-processing step before recommendation generation. This transformation flattens the rating distribution, better compensates for differences in rating distributions, and improves recommendation performance. We also show a smoothed version of this transformation designed to yield more intuitive results for users with very narrow...
more | pdf | html
Figures
None.
Tweets
arxivml: "Flatter is better: Percentile Transformations for Recommender Systems", Masoud Mansoury, Robin Burke, Bamshad Moba… https://t.co/qbrl1VZXlh
Memoirs: Flatter is better: Percentile Transformations for Recommender Systems. https://t.co/rpLLTElt8H
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.017 Mikeys
#4. A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams
Avishek Bose, Vahid Behzadan, Carlos Aguirre, William H. Hsu
We present a new machine learning and text information extraction approach to detection of cyber threat events in Twitter that are novel (previously non-extant) and developing (marked by significance with respect to similarity with a previously detected event). While some existing approaches to event detection measure novelty and trendiness, typically as independent criteria and occasionally as a holistic measure, this work focuses on detecting both novel and developing events using an unsupervised machine learning approach. Furthermore, our proposed approach enables the ranking of cyber threat events based on an importance score by extracting the tweet terms that are characterized as named entities, keywords, or both. We also impute influence to users in order to assign a weighted score to noun phrases in proportion to user influence and the corresponding event scores for named entities and keywords. To evaluate the performance of our proposed approach, we measure the efficiency and detection error rate for events over a...
more | pdf | html
Figures
Tweets
arxivml: "A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams", Avishek… https://t.co/SQTpOD21Iz
StatsPapers: A Novel Approach for Detection and Ranking of Trendy and Emerging Cyber Threat Events in Twitter Streams. https://t.co/oOorLA6Juk
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6906
Unqiue Words: 2216

2.014 Mikeys
#5. Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences
Yen-Hao Huang, Yi-Hsin Chen, Fernando Henrique Calderon Alvarado, Ssu-Rui Lee, Shu-I Wu, Yuwen Lai, Yi-Shin Chen
Most previous studies on automatic recognition model for bipolar disorder (BD) were based on both social media and linguistic features. The present study investigates the possibility of adopting only language-based features, namely the syntax and morpheme collocation. We also examine the effect of gender on the results considering gender has long been recognized as an important modulating factor for mental disorders, yet it received little attention in previous linguistic models. The present study collects Twitter posts 3 months prior to the self-disclosure by 349 BD users (231 female, 118 male). We construct a set of syntactic patterns in terms of the word usage based on graph pattern construction and pattern attention mechanism. The factors examined are gender differences, syntactic patterns, and bipolar recognition performance. The performance indicates our F1 scores reach over 91% and outperform several baselines, including those using TF-IDF, LIWC and pre-trained language models (ELMO and BERT). The contributions of the...
more | pdf | html
Figures
Tweets
arxivml: "Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences", Yen-Hao Huang, Yi… https://t.co/VGmNcPsum2
arxiv_cscl: Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences https://t.co/o3ci3en7HM
arxiv_cscl: Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences https://t.co/o3ci3eEIzk
arxiv_cscl: Leveraging Linguistic Characteristics for Bipolar Disorder Recognition with Gender Differences https://t.co/o3ci3eEIzk
Github
None.
Youtube
None.
Other stats
Sample Sizes : [349]
Authors: 7
Total Words: 4610
Unqiue Words: 1662

2.001 Mikeys
#6. Unbiased Learning to Rank: Counterfactual and Online Approaches
Harrie Oosterhuis, Rolf Jagerman, Maarten de Rijke
This tutorial covers and contrasts the two main methodologies in unbiased Learning to Rank (LTR): Counterfactual LTR and Online LTR. There has long been an interest in LTR from user interactions, however, this form of implicit feedback is very biased. In recent years, unbiased LTR methods have been introduced to remove the effect of different types of bias caused by user-behavior in search. For instance, a well addressed type of bias is position bias: the rank at which a document is displayed heavily affects the interactions it receives. Counterfactual LTR methods deal with such types of bias by learning from historical interactions while correcting for the effect of the explicitly modelled biases. Online LTR does not use an explicit user model, in contrast, it learns through an interactive process where randomized results are displayed to the user. Through randomization the effect of different types of bias can be removed from the learning process. Though both methodologies lead to unbiased LTR, their approaches differ...
more | pdf | html
Figures
None.
Tweets
HarrieOos: Together with @RolfJagerman @mdr, we present part III about Unbiased LTR: learning from user interactions. For more info see: https://t.co/tMo2fExzBO Slides are also online: https://t.co/KETqXFndtL 2/4
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3032
Unqiue Words: 1182

1.996 Mikeys
#7. The Role of Local Intrinsic Dimensionality in Benchmarking Nearest Neighbor Search
Martin Aumüller, Matteo Ceccarello
This paper reconsiders common benchmarking approaches to nearest neighbor search. It is shown that the concept of local intrinsic dimensionality (LID) allows to choose query sets of a wide range of difficulty for real-world datasets. Moreover, the effect of different LID distributions on the running time performance of implementations is empirically studied. To this end, different visualization concepts are introduced that allow to get a more fine-grained overview of the inner workings of nearest neighbor search principles. The paper closes with remarks about the diversity of datasets commonly used for nearest neighbor search benchmarking. It is shown that such real-world datasets are not diverse: results on a single dataset predict results on all other datasets well.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5488
Unqiue Words: 1773

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 160,428 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 160,428 papers.