### Top 10 Arxiv Papers Today in Distributed, Parallel, And Cluster Computing

##### #1. Distributed Black-Box Optimization via Error Correcting Codes
###### Burak Bartan, Mert Pilanci
We introduce a novel distributed derivative-free optimization framework that is resilient to stragglers. The proposed method employs coded search directions at which the objective function is evaluated, and a decoding step to find the next iterate. Our framework can be seen as an extension of evolution strategies and structured exploration methods where structured search directions were utilized. As an application, we consider black-box adversarial attacks on deep convolutional neural networks. Our numerical experiments demonstrate a significant improvement in the computation times.
more | pdf | html
None.
###### Tweets
arxiv_cs_LG: Distributed Black-Box Optimization via Error Correcting Codes. Burak Bartan and Mert Pilanci https://t.co/m3PSpGsrmJ
Memoirs: Distributed Black-Box Optimization via Error Correcting Codes. https://t.co/11btaBKFYE
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #2. Posit NPB: Assessing the Precision Improvement in HPC Scientific Applications
###### Steven W. D. Chien, Ivy B. Peng, Stefano Markidis
Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first explored the feasibility of Posit encoding in representative HPC applications by providing a 32-bit Posit NAS Parallel Benchmark (NPB) suite. Then, we evaluate the accuracy improvement in different HPC kernels compared to the IEEE 754 format. Our results indicate that using Posit encoding achieves optimized precision, ranging from 0.6 to 1.4 decimal digit, for all tested kernels and proxy-applications. Also, we quantified the overhead of the current software implementation of Posit encoding as 4x-19x that of IEEE 754 hardware implementation. Our study highlights the potential of hardware implementations of Posit to benefit a broad range of...
more | pdf | html
None.
###### Github
Repository: NAS-Posit-benchmark
User: steven-chien
Language: C++
Stargazers: 0
Subscribers: 1
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 3913
Unqiue Words: 1329

##### #3. A Unified Analysis Approach for Hardware and Software Implementations
###### Issam Damaj
Smart gadgets are being embedded almost in every aspect of our lives. From smart cities to smart watches, modern industries are increasingly supporting the Internet-of-Things (IoT). SysMART aims at making supermarkets smart, productive, and with a touch of modern lifestyle. While similar implementations to improve the shopping experience exists, they tend mainly to replace the shopping activity at the store with online shopping. Although online shopping reduces time and effort, it deprives customers from enjoying the experience. SysMART relies on cutting-edge devices and technology to simplify and reduce the time required during grocery shopping inside the supermarket. In addition, the system monitors and maintains perishable products in good condition suitable for human consumption. SysMART is built using state-of-the-art technologies that support rapid prototyping and precision data acquisition. The selected development environment is LabVIEW with its world-class interfacing libraries. The paper comprises a detailed system...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 0
Unqiue Words: 0

##### #4. Equal bi-Vectorized (EBV) method to high performance on GPU
###### Amirreza Hashemi, Mohsen Lahooti, Ebrahim Shirani
Due to importance of reducing of time solution in numerical codes, we propose an algorithm for parallel LU decomposition solver for dense and sparse matrices on GPU. This algorithm is based on first bi-vectorizing a triangular matrices of decomposed coefficient matrix and then equalizing vectors. So we improve performance of LU decomposition on equal contributed scheme on threads. This algorithm also is convenient for other parallelism method and multi devices. Several test cases show advantage of this method over other familiar method.
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #5. Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures
###### Vahid Noormofidi
In this work an astrophysical simulation code, XFLAT, is developed to study neutrino oscillations in supernovae. XFLAT is designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both the CPU and the Xeon Phi co-processor, the latter of which is based on the Intel Many Integrated Core Architecture (MIC). The performance of XFLAT on configurations and scenarios has been analyzed. In addition, the impact of I/O and the multi-node configuration on the Xeon Phi-equipped heterogeneous supercomputers such as Stampede at the Texas Advanced Computing Center (TACC) was investigated.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 65735
Unqiue Words: 8823

##### #6. A Recursive Algebraic Coloring Technique for Hardware-Efficient Symmetric Sparse Matrix-Vector Multiplication
###### Christie L. Alappat, Georg Hager, Olaf Schenk, Jonas Thies, Achim Basermann, Alan R. Bishop, Holger Fehske, Gerhard Wellein
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building block for many numerical linear algebra kernel operations or graph traversal applications. Parallelizing SymmSpMV on today's multicore platforms with up to 100 cores is difficult due to the need to manage conflicting updates on the result vector. Coloring approaches can be used to solve this problem without data duplication, but existing coloring algorithms do not take load balancing and deep memory hierarchies into account, hampering scalability and full-chip performance. In this work, we propose the recursive algebraic coloring engine (RACE), a novel coloring algorithm and open-source library implementation, which eliminates the shortcomings of previous coloring methods in terms of hardware efficiency and parallelization overhead. We describe the level construction, distance-k coloring, and load balancing steps in RACE, use it to parallelize SymmSpMV, and compare its performance on 31 sparse matrices with other state-of-the-art coloring...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 8
Total Words: 0
Unqiue Words: 0

##### #7. Dogfooding: use IBM Cloud services to monitor IBM Cloud infrastructure
###### William Pourmajidi, Andriy Miranskyy, John Steinbacher, Tony Erwin, David Godwin
The stability and performance of Cloud platforms are essential as they directly impact customers' satisfaction. Cloud service providers use Cloud monitoring tools to ensure that rendered services match the quality of service requirements indicated in established contracts such as service-level agreements. Given the enormous number of resources that need to be monitored, highly scalable and capable monitoring tools are designed and implemented by Cloud service providers such as Amazon, Google, IBM, and Microsoft. Cloud monitoring tools monitor millions of virtual and physical resources and continuously generate logs for each one of them. Considering that logs magnify any technical issue, they can be used for disaster detection, prevention, and recovery. However, logs are useless if they are not assessed and analyzed promptly. Thus, we argue that the scale of Cloud-generated logs makes it impossible for DevOps teams to analyze them effectively. This implies that one needs to automate the process of monitoring and analysis (e.g.,...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

##### #8. A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels
###### Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka
This paper proposes a versatile high-performance execution model, inspired by systolic arrays, for memory-bound regular kernels running on CUDA-enabled GPUs. We formulate a systolic model that shifts partial sums by CUDA warp primitives for the computation. We also employ register files as a cache resource in order to operate the entire model efficiently. We demonstrate the effectiveness and versatility of the proposed model for a wide variety of stencil kernels that appear commonly in HPC, and also convolution kernels (increasingly important in deep learning workloads). Our algorithm outperforms the top reported state-of-the-art stencil implementations, including implementations with sophisticated temporal and spatial blocking techniques, on the two latest Nvidia architectures: Tesla V100 and P100. For 2D convolution of general filter sizes and shapes, our algorithm is on average 2.5x faster than Nvidia's NPP on V100 and P100 GPUs.
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 11918
Unqiue Words: 3479

##### #9. Efficient self-stabilizing leader election in population protocols
###### Janna Burman, David Doty, Thomas Nowak, Eric E. Severson, Chuan Xu
We consider the standard population protocol model, where (a priori) indistinguishable and anonymous agents interact in pairs according to uniformly random scheduling. In this model, the only previously known protocol solving the self-stabilizing leader election problem by Cai, Izumi, and Wada [Theor. Comput. Syst. 50] runs in expected parallel time $\Theta(n^2)$ and has the optimal number of n states in a population of n agents. This protocol has the additional property that it becomes silent, i.e., the agents' states eventually stop changing. Observing that any silent protocol solving self-stabilizing leader election requires $\Omega(n)$ expected parallel time, we introduce a silent protocol that runs in optimal $O(n)$ expected parallel time with an exponential number of states, as well as a protocol with a slightly worse expected time complexity of $O(n\log n)$ but with the asymptotically optimal $O(n)$ states. Without any silence or state space constraints, we show that it is possible to solve self-stabilizing leader election...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 15722
Unqiue Words: 2950

##### #10. Supporting Security Sensitive Tenants in a Bare-Metal Cloud
###### Amin Mosayyebzadeh, Apoorve Mohan, Sahil Tikale, Mania Abdi, Nabil Schear, Charles Munson, Trammell Hudson, Larry Rudolph, Gene Cooperman, Peter Desnoyers, Orran Krieger
Bolted is a new architecture for bare-metal clouds that enables tenants to control tradeoffs between security, price, and performance. Security-sensitive tenants can minimize their trust in the public cloud provider and achieve similar levels of security and control that they can obtain in their own private data centers. At the same time, Bolted neither imposes overhead on tenants that are security insensitive nor compromises the flexibility or operational efficiency of the provider. Our prototype exploits a novel provisioning system and specialized firmware to enable elasticity similar to virtualized clouds. Experimentally we quantify the cost of different levels of security for a variety of workloads and demonstrate the value of giving control to the tenant.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 11
Total Words: 12158
Unqiue Words: 3434

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 158,360 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 158,360 papers.