### Top 10 Arxiv Papers Today in Data Structures And Algorithms

##### #1. Randomisation Algorithms for Large Sparse Matrices
###### Kai Puolamäki, Andreas Henelius, Antti Ukkonen
In many domains it is necessary to generate surrogate networks, e.g., for hypothesis testing of different properties of a network. Furthermore, generating surrogate networks typically requires that different properties of the network is preserved, e.g., edges may not be added or deleted and the edge weights may be restricted to certain intervals. In this paper we introduce a novel efficient property-preserving Markov Chain Monte Carlo method termed CycleSampler for generating surrogate networks in which (i) edge weights are constrained to an interval and node weights are preserved exactly, and (ii) edge and node weights are both constrained to intervals. These two types of constraints cover a wide variety of practical use-cases. The method is applicable to both undirected and directed graphs. We empirically demonstrate the efficiency of the CycleSampler method on real-world datasets. We provide an implementation of CycleSampler in R, with parts implemented in C.
more | pdf | html
None.
###### Tweets
ComputerPapers: Randomisation Algorithms for Large Sparse Matrices. https://t.co/Z0IoNzPghY
###### Github

CycleSampler

Repository: cyclesampler
User: edahelsinki
Language: R
Stargazers: 0
Subscribers: 3
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 9147
Unqiue Words: 2114

##### #2. Secretary Ranking with Minimal Inversions
###### Sepehr Assadi, Eric Balkanski, Renato Paes Leme
We study a twist on the classic secretary problem, which we term the secretary ranking problem: elements from an ordered set arrive in random order and instead of picking the maximum element, the algorithm is asked to assign a rank, or position, to each of the elements. The rank assigned is irrevocable and is given knowing only the pairwise comparisons with elements previously arrived. The goal is to minimize the distance of the rank produced to the true rank of the elements measured by the Kendall-Tau distance, which corresponds to the number of pairs that are inverted with respect to the true order. Our main result is a matching upper and lower bound for the secretary ranking problem. We present an algorithm that ranks $n$ elements with only $O(n^{3/2})$ inversions in expectation, and show that any algorithm necessarily suffers $\Omega(n^{3/2})$ inversions when there are $n$ available positions. In terms of techniques, the analysis of our algorithm draws connections to linear probing in the hashing literature, while our lower...
more | pdf | html
None.
###### Tweets
DO: Secretary Ranking with Minimal Inversions. https://t.co/K8utC1kZDj
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11286
Unqiue Words: 2553

##### #3. Prophet Inequalities for Independent Random Variables from an Unknown Distribution
###### José R. Correa, Paul Dütting, Felix Fischer, Kevin Schewior
A central object in optimal stopping theory is the single-choice prophet inequality for independent, identically distributed random variables: Given a sequence of random variables $X_1,\dots,X_n$ drawn independently from a distribution $F$, the goal is to choose a stopping time $\tau$ so as to maximize $\alpha$ such that for all distributions $F$ we have $\mathbb{E}[X_\tau] \geq \alpha \cdot \mathbb{E}[\max_tX_t]$. What makes this problem challenging is that the decision whether $\tau=t$ may only depend on the values of the random variables $X_1,\dots,X_t$ and on the distribution $F$. For quite some time the best known bound for the problem was $\alpha\geq1-1/e\approx0.632$ [Hill and Kertz, 1982]. Only recently this bound was improved by Abolhassani et al. [2017], and a tight bound of $\alpha\approx0.745$ was obtained by Correa et al. [2017]. The case where $F$ is unknown, such that the decision whether $\tau=t$ may depend only on the values of the first $t$ random variables but not on $F$, is equally well motivated (e.g., [Azar...
more | pdf | html
None.
###### Tweets
DO: Prophet Inequalities for Independent Random Variables from an Unknown Distribution. https://t.co/bxk4XaNx6b
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 14832
Unqiue Words: 2274

##### #4. Communication-Optimal Distributed Dynamic Graph Clustering
###### Chun Jiang Zhu, Tan Zhu, Kam-Yiu Lam, Song Han, Jinbo Bi
We consider the problem of clustering graph nodes over large-scale dynamic graphs, such as citation networks, images and web networks, when graph updates such as node/edge insertions/deletions are observed distributively. We propose communication-efficient algorithms for two well-established communication models namely the message passing and the blackboard models. Given a graph with $n$ nodes that is observed at $s$ remote sites over time $[1,t]$, the two proposed algorithms have communication costs $\tilde{O}(ns)$ and $\tilde{O}(n+s)$ ($\tilde{O}$ hides a polylogarithmic factor), almost matching their lower bounds, $\Omega(ns)$ and $\Omega(n+s)$, respectively, in the message passing and the blackboard models. More importantly, we prove that at each time point in $[1,t]$ our algorithms generate clustering quality nearly as good as that of centralizing all updates up to that time and then applying a standard centralized clustering algorithm. We conducted extensive experiments on both synthetic and real-life datasets which...
more | pdf | html
None.
###### Tweets
ComputerPapers: Communication-Optimal Distributed Dynamic Graph Clustering. https://t.co/4WGb7FASxD
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 9476
Unqiue Words: 2545

##### #5. Computing Quartet Distance is Equivalent to Counting 4-Cycles
###### Bartłomiej Dudek, Paweł Gawrychowski
The quartet distance is a measure of similarity used to compare two unrooted phylogenetic trees on the same set of $n$ leaves, defined as the number of subsets of four leaves related by a different topology in both trees. After a series of previous results, Brodal et al. [SODA 2013] presented an algorithm that computes this number in $\mathcal{O}(nd\log n)$ time, where $d$ is the maximum degree of a node. Our main contribution is a two-way reduction establishing that the complexity of computing the quartet distance between two trees on $n$ leaves is the same, up to polylogarithmic factors, as the complexity of counting 4-cycles in an undirected simple graph with $m$ edges. The latter problem has been extensively studied, and the fastest known algorithm by Vassilevska Williams [SODA 2015] works in $\mathcal{O}(m^{1.48})$ time. In fact, even for the seemingly simpler problem of detecting a 4-cycle, the best known algorithm works in $\mathcal{O}(m^{4/3})$ time, and a conjecture of Yuster and Zwick implies that this might be optimal....
more | pdf | html
None.
###### Tweets
ComputerPapers: Computing Quartet Distance is Equivalent to Counting 4-Cycles. https://t.co/84UygnqcrO
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 15314
Unqiue Words: 2843

##### #6. Submodular Optimization Over Streams with Inhomogeneous Decays
###### Junzhou Zhao, Shuo Shang, Pinghui Wang, John C. S. Lui, Xiangliang Zhang
Cardinality constrained submodular function maximization, which aims to select a subset of size at most $k$ to maximize a monotone submodular utility function, is the key in many data mining and machine learning applications such as data summarization and maximum coverage problems. When data is given as a stream, streaming submodular optimization (SSO) techniques are desired. Existing SSO techniques can only apply to insertion-only streams where each element has an infinite lifespan, and sliding-window streams where each element has a same lifespan (i.e., window size). However, elements in some data streams may have arbitrary different lifespans, and this requires addressing SSO over streams with inhomogeneous-decays (SSO-ID). This work formulates the SSO-ID problem and presents three algorithms: BasicStreaming is a basic streaming algorithm that achieves an $(1/2-\epsilon)$ approximation factor; HistApprox improves the efficiency significantly and achieves an $(1/3-\epsilon)$ approximation factor; HistStreaming is a streaming...
more | pdf | html
None.
###### Tweets
nmfeeds: [O] https://t.co/dowvvw7B1s Submodular Optimization Over Streams with Inhomogeneous Decays. Cardinality constrained submod...
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 9249
Unqiue Words: 2089

##### #7. Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension
###### András Gilyén, Seth Lloyd, Ewin Tang
We construct an efficient classical analogue of the quantum matrix inversion algorithm (HHL) for low-rank matrices. Inspired by recent work of Tang, assuming length-square sampling access to input data, we implement the pseudoinverse of a low-rank matrix and sample from the solution to the problem $Ax=b$ using fast sampling techniques. We implement the pseudo-inverse by finding an approximate singular value decomposition of $A$ via subsampling, then inverting the singular values. In principle, the approach can also be used to apply any desired "smooth" function to the singular values. Since many quantum algorithms can be expressed as a singular value transformation problem, our result suggests that more low-rank quantum algorithms can be effectively "dequantised" into classical length-square sampling algorithms.
more | pdf | html
None.
###### Tweets
lukOlejnik: Interesting result. A specific algorithm for a quantum computer ("low-rank stochastic regression") found to be efficiently solved by non-quantum computers. Not yet understood well which problems are best solved with quantum computers. https://t.co/ugGX81ga6m https://t.co/pr8CAPK6uy
QuantumMemeing: Fig. 1.26: A meme [1]. [1] Quantum Computing Memes for QMA-Complete Teens, Studies in Ancient Greek and Syriac Memeing (2018). Ewin Tang destroying the hopes and dreams of QMLers everywhere. Check out the paper here: https://t.co/jNo9XaPsRI https://t.co/oQwzeVkLyS
fgrosshans: .@ewintang is indeed on her way to dequantize everything: with https://t.co/EZ2M6ccYjk, HHL is dequantized. By the way, sorry for the first name typo and the gender error in the cited tweet. https://t.co/gAQrtarWea
yuyu_hf: また量子情報◯すマンの新作だ、、、 https://t.co/Ebas1imqAV
fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
ComputerPapers: Quantum-inspired low-rank stochastic regression with logarithmic dependence on the dimension. https://t.co/jDZCT9f2us
ewintang: New twin papers on quantum-inspired algorithms for low-rank matrix inversion: https://t.co/3T6Xx9G3hy and https://t.co/tUPCxbq0UA
octonion: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
makoto0218ne56: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
spidermanzano: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
Lukasaoz: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
nick_farina: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
postquantum: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
katelovesneuro: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
RichFelker: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
durumcrustulum: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
rdviii: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
gejikeiji: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
rbtcollins: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
iKodack: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
jpdowling: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
kamakiri_ys: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
MGimenoSegovia: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
aquintex: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
henryquantum: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
BulentKIzILtan: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
amy8492: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
matt_reagor: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
rayohauno: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
cocori_aqua: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
cocori_aqua: RT @yuyu_hf: また量子情報◯すマンの新作だ、、、 https://t.co/Ebas1imqAV
ilyaraz2: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
TilmaLabs: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
yoshi_and_aki: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
yoshi_and_aki: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
quantumbtc: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
AlyTarekIbrahim: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
ywyamashiro: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
wjmzbmr1: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
world_fantasia: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
akinori_kawachi: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
gshartnett: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
Deepneuron: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
DhammaKimpara: RT @QuantumHazzard: Wow -- the famous HHL algorithm doesn't actually need a quantum computer! https://t.co/5ZduTl0vcZ Another quantum algor…
ons_yy: RT @fgksk: うぉ。第三弾か。https://t.co/oMrsU9OB02
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 5438
Unqiue Words: 1428

##### #8. A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest
###### Neil Olver, Frans Schalekamp, Suzanne van der Ster, Leen Stougie, Anke van Zuylen
We give a 2-approximation algorithm for the Maximum Agreement Forest problem on two rooted binary trees. This NP-hard problem has been studied extensively in the past two decades, since it can be used to compute the rooted Subtree Prune-and-Regraft (rSPR) distance between two phylogenetic trees. Our algorithm is combinatorial and its running time is quadratic in the input size. To prove the approximation guarantee, we construct a feasible dual solution for a novel linear programming formulation. In addition, we show this linear program is stronger than previously known formulations, and we give a compact formulation, showing that it can be solved in polynomial time
more | pdf | html
None.
###### Tweets
ComputerPapers: A Duality Based 2-Approximation Algorithm for Maximum Agreement Forest. https://t.co/h5pRHTzcWo
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 17622
Unqiue Words: 2562

##### #9. Robustness of spectral methods for community detection
###### Ludovic Stephan, Laurent Massoulié
The present work is concerned with community detection. Specifically, we consider a random graph drawn according to the stochastic block model~: its vertex set is partitioned into blocks, or communities, and edges are placed randomly and independently of each other with probability depending only on the communities of their two endpoints. In this context, our aim is to recover the community labels better than by random guess, based only on the observation of the graph. In the sparse case, where edge probabilities are in $O(1/n)$, we introduce a new spectral method based on the distance matrix $D^{(l)}$, where $D^{(l)}_{ij} = 1$ iff the graph distance between $i$ and $j$, noted $d(i, j)$ is equal to $\ell$. We show that when $\ell \sim c\log(n)$ for carefully chosen $c$, the eigenvectors associated to the largest eigenvalues of $D^{(l)}$ provide enough information to perform non-trivial community recovery with high probability, provided we are above the so-called Kesten-Stigum threshold. This yields an efficient algorithm for...
more | pdf | html
None.
###### Tweets
ComputerPapers: Robustness of spectral methods for community detection. https://t.co/OYBNeoh6gW
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8125
Unqiue Words: 1814

##### #10. Vectorized Character Counting for Faster Pattern Matching
###### Roman Snytsar
Many modern sequence alignment tools implement fast string matching using the space efficient data structure called FM-index. The succinct nature of this data structure presents unique challenges for the algorithm designers. In this paper, we explore the opportunities for parallelization of the exact and inexact matches and present an efficient SIMD solution for the Occ portion of the algorithm. Our implementation computes all eight Occ values required for the inexact match algorithm step in a single pass. We showcase the algorithm performance in a multi-core genome aligner and discuss effects of the memory prefetch.
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 3451
Unqiue Words: 1156

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 57,756 papers.