We give new characterizations of the sample complexity of answering linear
queries (statistical queries) in the local and central models of differential
privacy:
*In the non-interactive local model, we give the first approximate
characterization of the sample complexity. Informally our bounds are tight to
within polylogarithmic factors in the number of queries and desired accuracy.
Our characterization extends to agnostic learning in the local model.
*In the central model, we give a characterization of the sample complexity in
the high-accuracy regime that is analogous to that of Nikolov, Talwar, and
Zhang (STOC 2013), but is both quantitatively tighter and has a dramatically
simpler proof.
Our lower bounds apply equally to the empirical and population estimation
problems. In both cases, our characterizations show that a particular
factorization mechanism is approximately optimal, and the optimal sample
complexity is bounded from above and below by well studied factorization norms
of a matrix associated with the queries.

more |
pdf
| html
None.

ccanonne_:
@AcharyaJayadev @thesasho I *knew* this would happen. Just not that soon.
https://t.co/u3mFllhr6j
The list grows. I'll end up buried under a stack of interesting papers.
(Joke aside: this does look great, @thesasho.) https://t.co/XsEZfHKK08

thesasho:
After years of talking about DP, Jon Ullman and I finally wrote a paper together https://t.co/WIaThvAFql. Also first paper with Alex Edmonds, supervised by me and Toni Pitassi. We give an essentially optimal algorithm to answer any set of statistical queries in local DP.

Memoirs:
The Power of Factorization Mechanisms in Local and Central Differential Privacy. https://t.co/IskrNmyvTo

heghbalz:
RT @ccanonne_: @AcharyaJayadev @thesasho I *knew* this would happen. Just not that soon.
https://t.co/u3mFllhr6j
The list grows. I'll end u…

bipr:
RT @thesasho: After years of talking about DP, Jon Ullman and I finally wrote a paper together https://t.co/WIaThvAFql. Also first paper wi…

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

We introduce a new family of compressed data structures to efficiently store
and query large string dictionaries in main memory. Our main technique is a
combination of hierarchical Front-coding with ideas from longest-common-prefix
computation in suffix arrays. Our data structures yield relevant space-time
tradeoffs in real-world dictionaries. We focus on two domains where string
dictionaries are extensively used and efficient compression is required: URL
collections, a key element in Web graphs and applications such as Web mining;
and collections of URIs and literals, the basic components of RDF datasets. Our
experiments show that our data structures achieve better compression than the
state-of-the-art alternatives while providing very competitive query times.

more |
pdf
| html
None.

powturbo:
🆕🗜️"Improved Compressed String Dictionaries" new family of CDS to efficiently store and query large string dictionaries in main memory...
📄https://t.co/Xk8PrxVAok
#⃣ #DataCompression #IntegerCompression #InformationRetrieval #DataStructures #Algorithms https://t.co/rXIPsxlnMd

okateim:
Improved Compressed String Dictionaries
https://t.co/YALkWfyOgC

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 0

Unqiue Words: 0

Motivated by applications such as discovering strong ties in social networks
and assembling genome subsequences in biology, we study the problem of
recovering a hidden $2k$-nearest neighbor (NN) graph in an $n$-vertex complete
graph, whose edge weights are independent and distributed according to $P_n$
for edges in the hidden $2k$-NN graph and $Q_n$ otherwise. The special case of
Bernoulli distributions corresponds to a variant of the Watts-Strogatz
small-world graph. We focus on two types of asymptotic recovery guarantees as
$n\to \infty$: (1) exact recovery: all edges are classified correctly with
probability tending to one; (2) almost exact recovery: the expected number of
misclassified edges is $o(nk)$. We show that the maximum likelihood estimator
achieves (1) exact recovery for $2 \le k \le n^{o(1)}$ if $ \liminf
\frac{2\alpha_n}{\log n}>1$; (2) almost exact recovery for $ 1 \le k \le
o\left( \frac{\log n}{\log \log n} \right)$ if $\liminf
\frac{kD(P_n||Q_n)}{\log n}>1$, where $\alpha_n \triangleq -2 \log \int \sqrt{d
P_n d...

more |
pdf
| html
None.

Memoirs:
Consistent recovery threshold of hidden nearest neighbor graphs. https://t.co/dIGIwjgbEv

SRoyLee:
Consistent recovery threshold of hidden nearest neighbor graphs - https://t.co/0nV3TOALuA

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 0

Unqiue Words: 0

We study high-dimensional sparse estimation tasks in a robust setting where a
constant fraction of the dataset is adversarially corrupted. Specifically, we
focus on the fundamental problems of robust sparse mean estimation and robust
sparse PCA. We give the first practically viable robust estimators for these
problems. In more detail, our algorithms are sample and computationally
efficient and achieve near-optimal robustness guarantees. In contrast to prior
provable algorithms which relied on the ellipsoid method, our algorithms use
spectral techniques to iteratively remove outliers from the dataset. Our
experimental evaluation on synthetic data shows that our algorithms are
scalable and significantly outperform a range of previous approaches, nearly
matching the best error rate without corruptions.

more |
pdf
| html
None.

StatsPapers:
Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering. https://t.co/6a9KTJxHP1

paulportesi:
RT @StatsPapers: Outlier-Robust High-Dimensional Sparse Estimation via Iterative Filtering. https://t.co/6a9KTJxHP1

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 15097

Unqiue Words: 3010

The raster model is commonly used for the representation of images in many
domains, and is especially useful in Geographic Information Systems (GIS) to
store information about continuous variables of the space (elevation,
temperature, etc.). Current representations of raster data are usually designed
for external memory or, when stored in main memory, lack efficient query
capabilities. In this paper we propose compact representations to efficiently
store and query raster datasets in main memory. We present different
representations for binary raster data, general raster data and time-evolving
raster data. We experimentally compare our proposals with traditional storage
mechanisms such as linear quadtrees or compressed GeoTIFF files. Results show
that our structures are up to 10 times smaller than classical linear quadtrees,
and even comparable in space to non-querieable representations of raster data,
while efficiently answering a number of typical queries.

more |
pdf
| html
None.

dbworld_:
https://t.co/gjJY8w8bhq Extending General Compact Querieable Representations to GIS Applications. (arXiv:1911.08376v1 [cs.DS]) #databases

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 14048

Unqiue Words: 2959

We study the single-choice Prophet Inequality problem when the gambler is
given access to samples. We show that the optimal competitive ratio of $1/2$
can be achieved with a single sample from each distribution. When the
distributions are identical, we show that for any constant $\varepsilon > 0$,
$O(n)$ samples from the distribution suffice to achieve the optimal competitive
ratio ($\approx 0.745$) within $(1+\varepsilon)$, resolving an open problem of
Correa, D\"utting, Fischer, and Schewior.

more |
pdf
| html
None.

DO:
Optimal Single-Choice Prophet Inequalities from Samples. https://t.co/bvp49WvhGZ

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

Given graphs $G$ and $H$, we propose a method to implicitly enumerate
topological-minor-embeddings of $H$ in $G$ using decision diagrams. We show a
useful application of our method to enumerating subgraphs characterized by
forbidden topological minors, that is, planar, outerplanar, series-parallel,
and cactus subgraphs. Computational experiments show that our method can find
all planar subgraphs in a given graph at most five orders of magnitude faster
than a naive backtracking-based method.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 0

Unqiue Words: 0

Network performance problems are notoriously difficult to diagnose. Prior
profiling systems collect performance statistics by keeping information about
each network flow, but maintaining per-flow state is not scalable on
resource-constrained NIC and switch hardware. Instead, we propose sketch-based
performance monitoring using memory that is sublinear in the number of flows.
Existing sketches estimate flow monitoring metrics based on flow sizes. In
contrast, performance monitoring typically requires combining information
across pairs of packets, such as matching a data packet with its acknowledgment
to compute a round-trip time. We define a new class of \emph{lean} algorithms
that use memory sublinear in both the size of input data and the number of
flows. We then introduce lean algorithms for a set of important statistics,
such as identifying flows with high latency, loss, out-of-order, or
retransmitted packets. We implement prototypes of our lean algorithms on a
commodity programmable switch using the P4 language. Our...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

In many combinatorial games, one can prove that the first player wins under
best play using a simple but non-constructive argument called
strategy-stealing. This work is about the complexity behind these proofs: how
hard is it to actually find a winning move in a game, when you know by
strategy-stealing that one exists? We prove that this problem is PSPACE-hard
already for Minimum Poset Games and Symmetric Maker-Maker Games, which are
simple classes of games that capture two of the main types of strategy-stealing
arguments in the current literature.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

GraphBLAS is an interface for implementing graph algorithms. Algorithms
implemented using the GraphBLAS interface are cast in terms of linear
algebra-like operations. However, many graph algorithms are canonically
described in terms of operations on vertices and/or edges. Despite the known
duality between these two representations, the differences in the way
algorithms are described using the two approaches can pose considerable
difficulties in the adoption of the GraphBLAS as standard interface for
development. This paper investigates a systematic approach for translating a
graph algorithm described in the canonical vertex and edge representation into
an implementation that leverages the GraphBLAS interface. We present a two-step
approach to this problem. First, we express common vertex- and edge-centric
design patterns using a linear algebraic language. Second, we map this
intermediate representation to the GraphBLAS interface. We illustrate our
approach by translating the delta-stepping single source shortest path
algorithm...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 6

Total Words: 0

Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 225,763 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible