We introduce a novel distributed derivative-free optimization framework that
is resilient to stragglers. The proposed method employs coded search directions
at which the objective function is evaluated, and a decoding step to find the
next iterate. Our framework can be seen as an extension of evolution strategies
and structured exploration methods where structured search directions were
utilized. As an application, we consider black-box adversarial attacks on deep
convolutional neural networks. Our numerical experiments demonstrate a
significant improvement in the computation times.

more |
pdf
| html
None.

arxiv_cs_LG:
Distributed Black-Box Optimization via Error Correcting Codes. Burak Bartan and Mert Pilanci https://t.co/m3PSpGsrmJ

Memoirs:
Distributed Black-Box Optimization via Error Correcting Codes. https://t.co/11btaBKFYE

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

Floating-point operations can significantly impact the accuracy and
performance of scientific applications on large-scale parallel systems.
Recently, an emerging floating-point format called Posit has attracted
attention as an alternative to the standard IEEE floating-point formats because
it could enable higher precision than IEEE formats using the same number of
bits. In this work, we first explored the feasibility of Posit encoding in
representative HPC applications by providing a 32-bit Posit NAS Parallel
Benchmark (NPB) suite. Then, we evaluate the accuracy improvement in different
HPC kernels compared to the IEEE 754 format. Our results indicate that using
Posit encoding achieves optimized precision, ranging from 0.6 to 1.4 decimal
digit, for all tested kernels and proxy-applications. Also, we quantified the
overhead of the current software implementation of Posit encoding as 4x-19x
that of IEEE 754 hardware implementation. Our study highlights the potential of
hardware implementations of Posit to benefit a broad range of...

more |
pdf
| html
None.

Stargazers: 0

Subscribers: 1

Subscribers: 1

Forks: 0

Open Issues: 0

Open Issues: 0

None.

Sample Sizes : None.

Authors: 3

Total Words: 3913

Unqiue Words: 1329

Smart gadgets are being embedded almost in every aspect of our lives. From
smart cities to smart watches, modern industries are increasingly supporting
the Internet-of-Things (IoT). SysMART aims at making supermarkets smart,
productive, and with a touch of modern lifestyle. While similar implementations
to improve the shopping experience exists, they tend mainly to replace the
shopping activity at the store with online shopping. Although online shopping
reduces time and effort, it deprives customers from enjoying the experience.
SysMART relies on cutting-edge devices and technology to simplify and reduce
the time required during grocery shopping inside the supermarket. In addition,
the system monitors and maintains perishable products in good condition
suitable for human consumption. SysMART is built using state-of-the-art
technologies that support rapid prototyping and precision data acquisition. The
selected development environment is LabVIEW with its world-class interfacing
libraries. The paper comprises a detailed system...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 0

Unqiue Words: 0

Due to importance of reducing of time solution in numerical codes, we propose
an algorithm for parallel LU decomposition solver for dense and sparse matrices
on GPU. This algorithm is based on first bi-vectorizing a triangular matrices
of decomposed coefficient matrix and then equalizing vectors. So we improve
performance of LU decomposition on equal contributed scheme on threads. This
algorithm also is convenient for other parallelism method and multi devices.
Several test cases show advantage of this method over other familiar method.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 0

Unqiue Words: 0

In this work an astrophysical simulation code, XFLAT, is developed to study
neutrino oscillations in supernovae. XFLAT is designed to utilize multiple
levels of parallelism through MPI, OpenMP, and SIMD instructions
(vectorization). It can run on both the CPU and the Xeon Phi co-processor, the
latter of which is based on the Intel Many Integrated Core Architecture (MIC).
The performance of XFLAT on configurations and scenarios has been analyzed. In
addition, the impact of I/O and the multi-node configuration on the Xeon
Phi-equipped heterogeneous supercomputers such as Stampede at the Texas
Advanced Computing Center (TACC) was investigated.

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 65735

Unqiue Words: 8823

The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important
building block for many numerical linear algebra kernel operations or graph
traversal applications. Parallelizing SymmSpMV on today's multicore platforms
with up to 100 cores is difficult due to the need to manage conflicting updates
on the result vector. Coloring approaches can be used to solve this problem
without data duplication, but existing coloring algorithms do not take load
balancing and deep memory hierarchies into account, hampering scalability and
full-chip performance. In this work, we propose the recursive algebraic
coloring engine (RACE), a novel coloring algorithm and open-source library
implementation, which eliminates the shortcomings of previous coloring methods
in terms of hardware efficiency and parallelization overhead. We describe the
level construction, distance-k coloring, and load balancing steps in RACE, use
it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices
with other state-of-the-art coloring...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 8

Total Words: 0

Unqiue Words: 0

The stability and performance of Cloud platforms are essential as they
directly impact customers' satisfaction. Cloud service providers use Cloud
monitoring tools to ensure that rendered services match the quality of service
requirements indicated in established contracts such as service-level
agreements. Given the enormous number of resources that need to be monitored,
highly scalable and capable monitoring tools are designed and implemented by
Cloud service providers such as Amazon, Google, IBM, and Microsoft. Cloud
monitoring tools monitor millions of virtual and physical resources and
continuously generate logs for each one of them. Considering that logs magnify
any technical issue, they can be used for disaster detection, prevention, and
recovery. However, logs are useless if they are not assessed and analyzed
promptly. Thus, we argue that the scale of Cloud-generated logs makes it
impossible for DevOps teams to analyze them effectively. This implies that one
needs to automate the process of monitoring and analysis (e.g.,...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

This paper proposes a versatile high-performance execution model, inspired by
systolic arrays, for memory-bound regular kernels running on CUDA-enabled GPUs.
We formulate a systolic model that shifts partial sums by CUDA warp primitives
for the computation. We also employ register files as a cache resource in order
to operate the entire model efficiently. We demonstrate the effectiveness and
versatility of the proposed model for a wide variety of stencil kernels that
appear commonly in HPC, and also convolution kernels (increasingly important in
deep learning workloads). Our algorithm outperforms the top reported
state-of-the-art stencil implementations, including implementations with
sophisticated temporal and spatial blocking techniques, on the two latest
Nvidia architectures: Tesla V100 and P100. For 2D convolution of general filter
sizes and shapes, our algorithm is on average 2.5x faster than Nvidia's NPP on
V100 and P100 GPUs.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 11918

Unqiue Words: 3479

We consider the standard population protocol model, where (a priori)
indistinguishable and anonymous agents interact in pairs according to uniformly
random scheduling. In this model, the only previously known protocol solving
the self-stabilizing leader election problem by Cai, Izumi, and Wada [Theor.
Comput. Syst. 50] runs in expected parallel time $\Theta(n^2)$ and has the
optimal number of n states in a population of n agents. This protocol has the
additional property that it becomes silent, i.e., the agents' states eventually
stop changing. Observing that any silent protocol solving self-stabilizing
leader election requires $\Omega(n)$ expected parallel time, we introduce a
silent protocol that runs in optimal $O(n)$ expected parallel time with an
exponential number of states, as well as a protocol with a slightly worse
expected time complexity of $O(n\log n)$ but with the asymptotically optimal
$O(n)$ states. Without any silence or state space constraints, we show that it
is possible to solve self-stabilizing leader election...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 15722

Unqiue Words: 2950

Bolted is a new architecture for bare-metal clouds that enables tenants to
control tradeoffs between security, price, and performance. Security-sensitive
tenants can minimize their trust in the public cloud provider and achieve
similar levels of security and control that they can obtain in their own
private data centers. At the same time, Bolted neither imposes overhead on
tenants that are security insensitive nor compromises the flexibility or
operational efficiency of the provider. Our prototype exploits a novel
provisioning system and specialized firmware to enable elasticity similar to
virtualized clouds. Experimentally we quantify the cost of different levels of
security for a variety of workloads and demonstrate the value of giving control
to the tenant.

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 11

Total Words: 12158

Unqiue Words: 3434

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 158,360 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible