Top 10 Arxiv Papers Today in Data Structures And Algorithms


2.026 Mikeys
#1. S-RASTER: Contraction Clustering for Evolving Data Streams
Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, Mats Jirstrand
Contraction Clustering (RASTER) is a very fast algorithm for density-based clustering, which requires only a single pass. It can process arbitrary amounts of data in linear time and in constant memory, quickly identifying approximate clusters. It also exhibits good scalability in the presence of multiple CPU cores. Yet, RASTER is limited to batch processing. In contrast, S-RASTER is an adaptation of RASTER to the stream processing paradigm that is able to identify clusters in evolving data streams. This algorithm retains the main benefits of its parent algorithm, i.e. single-pass linear time cost and constant memory requirements for each discrete time step in the sliding window. The sliding window is efficiently pruned, and clustering is still performed in linear time. Like RASTER, S-RASTER trades off an often negligible amount of precision for speed. It is therefore very well suited to real-world scenarios where clustering does not happen continually but only periodically. We describe the algorithm, including a discussion of...
more | pdf | html
Figures
None.
Tweets
arxiv_cs_LG: S-RASTER: Contraction Clustering for Evolving Data Streams. Gregor Ulm, Simon Smith, Adrian Nilsson, Emil Gustavsson, and Mats Jirstrand https://t.co/9YS6UAvpBt
Memoirs: S-RASTER: Contraction Clustering for Evolving Data Streams. https://t.co/vPjdsXnBbm
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 5320
Unqiue Words: 1625

2.007 Mikeys
#2. Energy consumption in compact integer vectors: A study case
José Fuentes-Sepúlveda, Susana Ladra
In the field of algorithms and data structures analysis and design, most of the researchers focus only on the space/time trade-off, and little attention has been paid to energy consumption. Moreover, most of the efforts in the field of Green Computing have been devoted to hardware-related issues, being green software in its infancy. Optimizing the usage of computing resources, minimizing power consumption or increasing battery life are some of the goals of this field of research. As an attempt to address the most recent sustainability challenges, we must incorporate the energy consumption as a first-class constraint when designing new compact data structures. Thus, as a preliminary work to reach that goal, we first need to understand the factors that impact on the energy consumption and their relation with compression. In this work, we study the energy consumption required by several integer vector representations. We execute typical operations over datasets of different nature. We can see that, as commonly believed, energy...
more | pdf | html
Figures
None.
Tweets
powturbo: 🆕🔥🗜️"Energy consumption in compact integer vectors: A study case" 🔗 https://t.co/DGLI1NP3Ic #⃣ #DataCompression #IntegerCompression #DataStructures #Algorithms #EnergyConsumption https://t.co/TFoxg7lPb7
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

2.002 Mikeys
#3. The Power of Factorization Mechanisms in Local and Central Differential Privacy
Alexander Edmonds, Aleksandar Nikolov, Jonathan Ullman
We give new characterizations of the sample complexity of answering linear queries (statistical queries) in the local and central models of differential privacy: *In the non-interactive local model, we give the first approximate characterization of the sample complexity. Informally our bounds are tight to within polylogarithmic factors in the number of queries and desired accuracy. Our characterization extends to agnostic learning in the local model. *In the central model, we give a characterization of the sample complexity in the high-accuracy regime that is analogous to that of Nikolov, Talwar, and Zhang (STOC 2013), but is both quantitatively tighter and has a dramatically simpler proof. Our lower bounds apply equally to the empirical and population estimation problems. In both cases, our characterizations show that a particular factorization mechanism is approximately optimal, and the optimal sample complexity is bounded from above and below by well studied factorization norms of a matrix associated with the queries.
more | pdf | html
Figures
None.
Tweets
ccanonne_: @AcharyaJayadev @thesasho I *knew* this would happen. Just not that soon. https://t.co/u3mFllhr6j The list grows. I'll end up buried under a stack of interesting papers. (Joke aside: this does look great, @thesasho.) https://t.co/XsEZfHKK08
thesasho: After years of talking about DP, Jon Ullman and I finally wrote a paper together https://t.co/WIaThvAFql. Also first paper with Alex Edmonds, supervised by me and Toni Pitassi. We give an essentially optimal algorithm to answer any set of statistical queries in local DP.
Memoirs: The Power of Factorization Mechanisms in Local and Central Differential Privacy. https://t.co/IskrNmyvTo
heghbalz: RT @ccanonne_: @AcharyaJayadev @thesasho I *knew* this would happen. Just not that soon. https://t.co/u3mFllhr6j The list grows. I'll end u…
bipr: RT @thesasho: After years of talking about DP, Jon Ullman and I finally wrote a paper together https://t.co/WIaThvAFql. Also first paper wi…
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

2.0 Mikeys
#4. Consistent recovery threshold of hidden nearest neighbor graphs
Jian Ding, Yihong Wu, Jiaming Xu, Dana Yang
Motivated by applications such as discovering strong ties in social networks and assembling genome subsequences in biology, we study the problem of recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete graph, whose edge weights are independent and distributed according to $P_n$ for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of Bernoulli distributions corresponds to a variant of the Watts-Strogatz small-world graph. We focus on two types of asymptotic recovery guarantees as $n\to \infty$: (1) exact recovery: all edges are classified correctly with probability tending to one; (2) almost exact recovery: the expected number of misclassified edges is $o(nk)$. We show that the maximum likelihood estimator achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf \frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf \frac{kD(P_n||Q_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d P_n d...
more | pdf | html
Figures
None.
Tweets
Memoirs: Consistent recovery threshold of hidden nearest neighbor graphs. https://t.co/dIGIwjgbEv
SRoyLee: Consistent recovery threshold of hidden nearest neighbor graphs - https://t.co/0nV3TOALuA
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#5. Improved Compressed String Dictionaries
Nieves R. Brisaboa, Ana Cerdeira-Pena, Guillermo de Bernardo, Gonzalo Navarro
We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.
more | pdf | html
Figures
None.
Tweets
powturbo: 🆕🗜️"Improved Compressed String Dictionaries" new family of CDS to efficiently store and query large string dictionaries in main memory... 📄https://t.co/Xk8PrxVAok #⃣ #DataCompression #IntegerCompression #InformationRetrieval #DataStructures #Algorithms https://t.co/rXIPsxlnMd
okateim: Improved Compressed String Dictionaries https://t.co/YALkWfyOgC
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#6. Learning-Assisted Competitive Algorithms for Peak-Aware Energy Scheduling
Russell Lee, Mohammad H. Hajiesmaili, Jian Li
In this paper, we study the peak-aware energy scheduling problem using the competitive framework with machine learning prediction. With the uncertainty of energy demand as the fundamental challenge, the goal is to schedule the energy output of local generation units such that the electricity bill is minimized. While this problem has been tackled using classic competitive design with worst-case guarantee, the goal of this paper is to develop learning-assisted competitive algorithms to improve the performance in a provable manner. We develop two deterministic and randomized algorithms that are provably robust against the poor performance of learning prediction, however, achieve the optimal performance as the error of prediction goes to zero. Extensive experiments using real data traces verify our theoretical observations and show 15.13% improved performance against pure online algorithms.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 10370
Unqiue Words: 2022

1.998 Mikeys
#7. Property Testing of LP-Type Problems
Rogers Epstein, Sandeep Silwal
Given query access to a set of constraints $S$, we wish to quickly check if some objective function $\varphi$ subject to these constraints is at most a given value $k$. We approach this problem using the framework of property testing where our goal is to distinguish the case $\varphi(S) \le k$ from the case that at least an $\epsilon$ fraction of the constraints in $S$ need to be removed for $\varphi(S) \le k$ to hold. We restrict our attention to the case where $(S, \varphi)$ are LP-Type problems which is a rich family of combinatorial optimization problems with an inherent geometric structure. By utilizing a simple sampling procedure which has been used previously to study these problems, we are able to create property testers for any LP-Type problem whose query complexities are independent of the number of constraints. To the best of our knowledge, this is the first work that connects the area of LP-Type problems and property testing in a systematic way. Among our results is a tight upper bound on the query complexity of...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#8. Optimal Single-Choice Prophet Inequalities from Samples
Aviad Rubinstein, Jack Z. Wang, S. Matthew Weinberg
We study the single-choice Prophet Inequality problem when the gambler is given access to samples. We show that the optimal competitive ratio of $1/2$ can be achieved with a single sample from each distribution. When the distributions are identical, we show that for any constant $\varepsilon > 0$, $O(n)$ samples from the distribution suffice to achieve the optimal competitive ratio ($\approx 0.745$) within $(1+\varepsilon)$, resolving an open problem of Correa, D\"utting, Fischer, and Schewior.
more | pdf | html
Figures
None.
Tweets
DO: Optimal Single-Choice Prophet Inequalities from Samples. https://t.co/bvp49WvhGZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#9. Low-Rank Toeplitz Matrix Estimation via Random Ultra-Sparse Rulers
Hannah Lawrence, Jerry Li, Cameron Musco, Christopher Musco
We study how to estimate a nearly low-rank Toeplitz covariance matrix $T$ from compressed measurements. Recent work of Qiao and Pal addresses this problem by combining sparse rulers (sparse linear arrays) with frequency finding (sparse Fourier transform) algorithms applied to the Vandermonde decomposition of $T$. Analytical bounds on the sample complexity are shown, under the assumption of sufficiently large gaps between the frequencies in this decomposition. In this work, we introduce random ultra-sparse rulers and propose an improved approach based on these objects. Our random rulers effectively apply a random permutation to the frequencies in $T$'s Vandermonde decomposition, letting us avoid frequency gap assumptions and leading to improved sample complexity bounds. In the special case when $T$ is circulant, we theoretically analyze the performance of our method when combined with sparse Fourier transform algorithms based on random hashing. We also show experimentally that our ultra-sparse rulers give significantly more robust...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5386
Unqiue Words: 1720

1.998 Mikeys
#10. Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering
Ilias Diakonikolas, Sushrut Karmalkar, Daniel Kane, Eric Price, Alistair Stewart
We study high-dimensional sparse estimation tasks in a robust setting where a constant fraction of the dataset is adversarially corrupted. Specifically, we focus on the fundamental problems of robust sparse mean estimation and robust sparse PCA. We give the first practically viable robust estimators for these problems. In more detail, our algorithms are sample and computationally efficient and achieve near-optimal robustness guarantees. In contrast to prior provable algorithms which relied on the ellipsoid method, our algorithms use spectral techniques to iteratively remove outliers from the dataset. Our experimental evaluation on synthetic data shows that our algorithms are scalable and significantly outperform a range of previous approaches, nearly matching the best error rate without corruptions.
more | pdf | html
Figures
None.
Tweets
StatsPapers: Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering. https://t.co/6a9KTJxHP1
paulportesi: RT @StatsPapers: Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering. https://t.co/6a9KTJxHP1
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 15097
Unqiue Words: 3010

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 226,490 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 226,490 papers.