### Top 10 Arxiv Papers Today in Distributed, Parallel, And Cluster Computing

##### #1. On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
###### Aritra Dutta, El Houcine Bergou, Ahmed M. Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, Panos Kalnis
Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model. In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise...
more | pdf | html
None.
###### Tweets
BrundageBot: On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning. Dutta, Bergou, Abdelmoniem, Ho, Sahu, Canini, and Kalnis https://t.co/7irTdyRZck
Memoirs: On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning. https://t.co/b8CzBsmz5v
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 7
Total Words: 7874
Unqiue Words: 1991

##### #2. The Design and Implementation of a Scalable DL Benchmarking Platform
###### Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu
The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify $10$ design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking platform design that realizes the $10$ objectives. MLModelScope proposes a specification to define DL model evaluations and techniques to provision the...
more | pdf | html
None.
###### Tweets
BrundageBot: The Design and Implementation of a Scalable DL Benchmarking Platform. Cheng Li, Abdul Dakkak, Jinjun Xiong, and Wen-mei Hwu https://t.co/TLmgYRr6nk
StatsPapers: The Design and Implementation of a Scalable DL Benchmarking Platform. https://t.co/SHODPkcb47
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11172
Unqiue Words: 3401

##### #3. Distributed Machine Learning through Heterogeneous Edge Systems
###### Hanpeng Hu, Dan Wang, Chuan Wu
Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization scheme for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and...
more | pdf | html
None.
###### Tweets
arxiv_in_review: #AAAI2020 Distributed Machine Learning through Heterogeneous Edge Systems. (arXiv:1911.06949v1 [cs\.DC]) https://t.co/0d9ToMXzDs
Memoirs: Distributed Machine Learning through Heterogeneous Edge Systems. https://t.co/DH50DVeplZ
drahmadbazzi: RT @Memoirs: Distributed Machine Learning through Heterogeneous Edge Systems. https://t.co/DH50DVeplZ
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #4. PES: Proactive Event Scheduling for Responsive and Energy-Efficient Mobile Web Computing
###### Yu Feng, Yuhao Zhu
Web applications are gradually shifting toward resource-constrained mobile devices. As a result, the Web runtime system must simultaneously address two challenges: responsiveness and energy-efficiency. Conventional Web runtime systems fall short due to their reactive nature: they react to a user event only after it is triggered. The reactive strategy leads to local optimizations that schedule event executions one at a time, missing global optimization opportunities. This paper proposes Proactive Event Scheduling (PES). The key idea of PES is to proactively anticipate future events and thereby globally coordinate scheduling decisions across events. Specifically, PES predicts events that are likely to happen in the near future using a combination of statistical inference and application code analysis. PES then speculatively executes future events ahead of time in a way that satisfies the QoS constraints of all the events while minimizing the global energy consumption. Fundamentally, PES unlocks more optimization opportunities by...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

##### #5. A Spark ML driven preprocessing approach for deep learning based scholarly data applications
###### Samiya Khan, Xiufeng Liu, Mansaf Alam
Big data has found applications in multiple domains. One of the largest sources of textual big data is scientific documents and papers. Big scholarly data have been used in numerous ways to create innovative applications such as collaborator discovery, expert finding and research management systems. With the advent of advanced machine and deep learning techniques, the accuracy and novelty of such applications have risen manifold. However, the biggest challenge in the development of deep learning models for scholarly applications in cloud based environment is the underutilization of resources because of the excessive time taken by textual preprocessing. This paper presents a preprocessing pipeline that makes use of Spark for data ingestion and Spark ML for pipelining preprocessing tasks. The evaluation of the proposed work is done using a case study, which uses LSTM based text summarization for generating title or summary from abstract of any research. The ingestion, preprocessing and cumulative time for the proposed approach...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

##### #6. Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs
###### Jung-Woo Chang, Saehyun Ahn, Keon-Woo Kang, Suk-Ju Kang
Generative adversarial networks (GANs) have shown excellent performance in image and speech applications. GANs create impressive data primarily through a new type of operator called deconvolution (DeConv) or transposed convolution (Conv). To implement the DeConv layer in hardware, the state-of-the-art accelerator reduces the high computational complexity via the DeConv-to-Conv conversion and achieves the same results. However, there is a problem that the number of filters increases due to this conversion. Recently, Winograd minimal filtering has been recognized as an effective solution to improve the arithmetic complexity and resource efficiency of the Conv layer. In this paper, we propose an efficient Winograd DeConv accelerator that combines these two orthogonal approaches on FPGAs. Firstly, we introduce a new class of fast algorithm for DeConv layers using Winograd minimal filtering. Since there are regular sparse patterns in Winograd filters, we further amortize the computational complexity by skipping zero weights. Secondly,...
more | pdf | html
###### Tweets
arxivml: "Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs"… https://t.co/58Co4lyDVM
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4281
Unqiue Words: 1382

##### #7. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU
###### Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Keshav Pingali
There is growing interest in graph mining algorithms such as motif counting. Generic graph mining systems have been developed to provide unified interfaces for programming these algorithms. However, existing systems take minutes or even hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and flexible in-memory graph mining framework targeting both shared-memory CPUs and GPUs. Pangolin is the first graph mining system that supports GPU processing. We provide a simple embedding-centric programming interface based on the extend-reduce-filter model, which enables user to specify application-specific knowledge like aggressive enumeration search space pruning and isomorphism test elimination. We also describe novel optimizations that exploit locality, reduce memory consumption, and mitigate overheads of dynamic memory allocation and synchronization. Evaluation on a 28-core CPU demonstrates that Pangolin outperforms Arabesque and...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 13392
Unqiue Words: 3644

##### #8. Exact Byzantine Consensus on Arbitrary Directed Graphs under Local Broadcast Model
###### Muhammad Samir Khan, Lewis Tseng, Nitin H. Vaidya
We consider Byzantine consensus in a synchronous system where nodes are connected by a network modeled as a directed graph, i.e., communication links between neighboring nodes are not necessarily bi-directional. The directed graph model is motivated by wireless networks wherein asymmetric communication links can occur. In the classical point-to-point communication model, a message sent on a communication link is private between the two nodes on the link. This allows a Byzantine faulty node to equivocate, i.e., send inconsistent information to its neighbors. This paper considers the local broadcast model of communication, wherein transmission by a node is received identically by all of its outgoing neighbors. This allows such neighbors to detect a faulty node's attempt to equivocate, effectively depriving the faulty nodes of the ability to send conflicting information to different neighbors. Prior work has obtained sufficient and necessary conditions on undirected graphs to be able to achieve Byzantine consensus under the local...
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 14160
Unqiue Words: 1945

##### #9. A Code injection Method for Rapid Docker Image Building
###### Yujing Wang, Qinyang Bao
Docker images are built by layers, yet the current implementation has major inefficiencies that makes rebuilding of an image unnecessarily slow when changes in bottom layers are required. In this paper, we propose a code injection method that overcomes these efficiencies by targeting only the changed layer and then bypassing Docker's layer checksum process.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 1873
Unqiue Words: 793

##### #10. Decentralization in Open Quorum Systems
###### Andrea Bracciali, Davide Grossi, Ronald de Haan
Decentralisation is one of the promises introduced by blockchain technologies: fair and secure interaction amongst peers with no dominant positions, single points of failure or censorship. Decentralisation, however, appears difficult to be formally defined, possibly a continuum property of systems that can be more or less decentralised, or can tend to decentralisation in their lifetime. In this paper we focus on decentralisation in quorum-based approaches to open (permissionless) consensus as illustrated in influential protocols such as the Ripple and Stellar protocols. Drawing from game theory and computational complexity, we establish limiting results concerning the decentralisation vs. safety trade-off in Ripple and Stellar, and we propose a novel methodology to formalise and quantitatively analyse decentralisation in this type of blockchains.
more | pdf | html
None.
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 225,737 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 225,737 papers.