Gradient boosting decision tree (GBDT) is a powerful and widely-used machine
learning model, which has achieved state-of-the-art performance in many
academic areas and production environment. However, communication overhead is
the main bottleneck in distributed training which can handle the massive data
nowadays. In this paper, we propose two novel communication-efficient methods
over distributed dataset to mitigate this problem, a weighted sampling approach
by which we can estimate the information gain over a small subset efficiently,
and distributed protocols for weighted quantile problem used in approximate
tree learning.

more |
pdf
| html
None.

arxiv_in_review:
#NeurIPS2019 Communication-Efficient Weighted Sampling and Quantile Summary for GBDT. (arXiv:1909.07633v1 [cs\.DS]) https://t.co/OAufxyP3AI

arxiv_cs_LG:
Communication-Efficient Weighted Sampling and Quantile Summary for GBDT. Ziyue Huang and Ke Yi https://t.co/DR9TXGXQ0l

okateim:
2019/09/18 [5]
Communication-Efficient Weighted Sampling and Quantile Summary for GBDT (https://t.co/0Kq4FR7IWO)

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 3029

Unqiue Words: 954

We study the problem of computing public transit traffic assignments in a
multi-modal setting: Given a public transit timetable, an additional
unrestricted transfer mode (in our case walking), and a set of
origin-destination pairs, we aim to compute the utilization of every vehicle in
the network. While it has been shown that considering unrestricted transfers
can significantly improve journeys, computing such journeys efficiently remains
algorithmically challenging. Since traffic assignments require the computation
of millions of shortest paths, using a multi-modal network has previously not
been feasible. A recently proposed approach (ULTRA) enables efficient
algorithms with UnLimited TRAnsfers at the cost of a short preprocessing phase.
In this work we combine the ULTRA approach with a state-of-the-art assignment
algorithm, making multi-modal assignments practical. Careful algorithm
engineering results in a new public transit traffic assignment algorithm that
even outperforms the algorithm it builds upon, while enabling...

more |
pdf
| html
okateim:
2019/09/19 [1]
Efficient Computation of Multi-Modal Public Transit Traffic Assignments using ULTRA (https://t.co/sELT8ceL5r)

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 9920

Unqiue Words: 2376

Maximum Independent Set (MIS for short) is in general graphs the paradigmatic
$W[1]$-hard problem. In stark contrast, polynomial-time algorithms are known
when the inputs are restricted to structured graph classes such as, for
instance, perfect graphs (which includes bipartite graphs, chordal graphs,
co-graphs, etc.) or claw-free graphs. In this paper, we introduce some variants
of co-graphs with parameterized noise, that is, graphs that can be made into
disjoint unions or complete sums by the removal of a certain number of vertices
and the addition/deletion of a certain number of edges per incident vertex,
both controlled by the parameter. We give a series of FPT Turing-reductions on
these classes and use them to make some progress on the parameterized
complexity of MIS in $H$-free graphs. We show that for every fixed $t \geqslant
1$, MIS is FPT in $P(1,t,t,t)$-free graphs, where $P(1,t,t,t)$ is the graph
obtained by substituting all the vertices of a four-vertex path but one end of
the path by cliques of size $t$. We also...

more |
pdf
| html
None.

okateim:
2019/09/19 [2]
When Maximum Stable Set can be solved in FPT time (https://t.co/jVQJtfpZUT)

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 23281

Unqiue Words: 3518

The Lovasz Local Lemma (LLL) is a keystone principle in probability theory,
guaranteeing the existence of configurations which avoid a collection $\mathcal
B$ of "bad" events which are mostly independent and have low probability. In
its simplest "symmetric" form, it asserts that whenever a bad-event has
probability $p$ and affects at most $d$ bad-events, and $e p d < 1$, then a
configuration avoiding all $\mathcal B$ exists.
A seminal algorithm of Moser & Tardos (2010) gives nearly-automatic
randomized algorithms for most constructions based on the LLL. However,
deterministic algorithms have lagged behind. We address three specific
shortcomings of the prior deterministic algorithms. First, our algorithm
applies to the LLL criterion of Shearer (1985); this is more powerful than
alternate LLL criteria and also removes a number of nuisance parameters and
leads to cleaner and more legible bounds. Second, we provide parallel
algorithms with much greater flexibility in the functional form of of the
bad-events. Third, we provide a...

more |
pdf
| html
None.

okateim:
2019/09/19 [4]
Deterministic algorithms for the Lovasz Local Lemma: simpler, more general, and more parallel (https://t.co/Go8ufqr0qz)

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

We suggest a reduction of the combinatorial problem of hypergraph
partitioning to a continuous optimization problem.

more |
pdf
| html
None.

okateim:
2019/09/19 [3]
Hypergraph partitions (https://t.co/anQgvpll3Q)

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 1625

Unqiue Words: 737

For a graph $G$, let $\Omega(G)$ denote the set of all potential maximal
cliques of $G$. For each subset $\Omega$ of $\Omega(G)$, let $\tw(G, \Omega)$
denote the smallest $k$ such that there is a tree-decomposition of $G$ of width
$k$ whose bags all belong to $\Omega$. Bouchitt\'{e} and Todinca observed in
2001 that $\tw(G, \Omega(G))$ is exactly the treewidth of $G$ and developed a
dynamic programming algorithm to compute it. Indeed, their algorithm can
readily be applied to an arbitrary non-empty subset $\Omega$ of $\Omega(G)$ and
computes $\tw(G, \Omega)$, or reports that it is undefined, in time
$|\Omega||V(G)|^{O(1)}$. This efficient tool for computing $\tw(G, \Omega)$
allows us to conceive of an iterative improvement procedure for treewidth upper
bounds which maintains, as the current solution, a set of potential maximal
cliques rather than a tree-decomposition.
We design and implement an algorithm along this approach. Experiments show
that our algorithm vastly outperforms previously implemented heuristic
algorithms for treewidth.

more |
pdf
| html
None.

okateim:
2019/09/18 [4]
A heuristic use of dynamic programming to upperbound treewidth (https://t.co/3k9HmG76qp)

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

We generalise the results of Bhattacharya et al. (Journal of Computing
Systems, 62(1):93-115, 2018) for the list-$k$-means problem defined as -- for a
(unknown) partition $X_1, ..., X_k$ of the dataset $X \subseteq \mathbb{R}^d$,
find a list of $k$-center sets (each element in the list is a set of $k$
centers) such that at least one of $k$-center sets $\{c_1, ..., c_k\}$ in the
list gives an $(1+\varepsilon)$-approximation with respect to the cost function
$\min_{\textrm{permutation } \pi} \left[ \sum_{i=1}^{k} \sum_{x \in X_i} ||x -
c_{\pi(i)}||^2 \right]$. The list-$k$-means problem is important for the
constrained $k$-means problem since algorithms for the former can be converted
to PTAS for various versions of the latter. Following are the consequences of
our generalisations:
- Streaming algorithm: Our $D^2$-sampling based algorithm running in a single
iteration allows us to design a 2-pass, logspace streaming algorithm for the
list-$k$-means problem. This can be converted to a 4-pass, logspace streaming
PTAS for various...

more |
pdf
| html
None.

okateim:
2019/09/18 [9]
Streaming PTAS for Constrained k-Means (https://t.co/ISfJkRpCet)

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

Let $G=(V,E)$ be a multigraph with a set $T\subseteq V$ of terminals. A path
in $G$ is called a $T$-path if its ends are distinct vertices in $T$ and no
internal vertices belong to $T$. In 1978, Mader showed a characterization of
the maximum number of edge-disjoint $T$-paths. The original proof was not
constructive, and hence it did not suggest an efficient algorithm.
In this paper, we provide a combinatorial, deterministic algorithm for
finding the maximum number of edge-disjoint $T$-paths. The algorithm adopts an
augmenting path approach. More specifically, we introduce a novel concept of
augmenting walks in auxiliary labeled graphs to capture a possible augmentation
of the number of edge-disjoint $T$-paths. To design a search procedure for an
augmenting walk, we introduce blossoms analogously to the matching algorithm of
Edmonds (1965), while it is neither a special case nor a generalization of the
present problem. When the search procedure terminates without finding an
augmenting walk, the algorithm provides a certificate...

more |
pdf
| html
None.

okateim:
2019/09/18 [1]
Combinatorial Algorithms for Edge-Disjoint $T$-Paths and Integer Free Multiflow (https://t.co/kgmKgo276E)

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 11710

Unqiue Words: 1824

Recent years are characterized by an unprecedented quantity of available
network data which are produced at an astonishing rate by an heterogeneous
variety of interconnected sensors and devices. This high-throughput generation
calls for the development of new effective methods to store, retrieve,
understand and process massive network data. In this thesis, we tackle this
challenge by introducing a framework to summarize large graphs based on
Szemer\'edi's Regularity Remma (RL), which roughly states that any sufficiently
large graph can almost entirely be partitioned into a bounded number of
random-like bipartite graphs. The partition resulting from the RL gives rise to
a summary, which inherits many of the essential structural properties of the
original graph. We first extend an heuristic version of the RL to improve its
efficiency and its robustness. We use the proposed algorithm to address
graph-based clustering and image segmentation tasks. In the second part of the
thesis, we introduce a new heuristic algorithm which is...

more |
pdf
| html
None.

okateim:
2019/09/18 [11]
Regular Partitions and Their Use in Structural Pattern Recognition (https://t.co/Hko8KIEVNC)

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

The \textsc{Housing Market} problem is a widely studied resource allocation
problem. In this problem, each agent can only receive a single object and has
preferences over all objects. Starting from an initial endowment, we want to
reach a certain assignment via a sequence of rational trades. We first consider
whether an object is reachable for a given agent under a social network, where
a trade between two agents is allowed if they are neighbors in the network and
no participant has a deficit from the trade. Assume that the preferences of the
agents are strict (no tie among objects is allowed). This problem is
polynomially solvable in a star-network and NP-complete in a tree-network. It
is left as a challenging open problem whether the problem is polynomially
solvable when the network is a path. We answer this open problem positively by
giving a polynomial-time algorithm. Then we show that when the preferences of
the agents are weak (ties among objects are allowed), the problem becomes
NP-hard even when the network is a path. In...

more |
pdf
| html
None.

okateim:
2019/09/18 [6]
Object Reachability via Swaps under Strict and Weak Preferences (https://t.co/pPlQbalhmo)

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 15776

Unqiue Words: 1939

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 192,929 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible