Top 10 Arxiv Papers Today in Distributed, Parallel, And Cluster Computing


2.008 Mikeys
#1. On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Aritra Dutta, El Houcine Bergou, Ahmed M. Abdelmoniem, Chen-Yu Ho, Atal Narayan Sahu, Marco Canini, Panos Kalnis
Compressed communication, in the form of sparsification or quantization of stochastic gradients, is employed to reduce communication costs in distributed data-parallel training of deep neural networks. However, there exists a discrepancy between theory and practice: while theoretical analysis of most existing compression methods assumes compression is applied to the gradients of the entire model, many practical implementations operate individually on the gradients of each layer of the model. In this paper, we prove that layer-wise compression is, in theory, better, because the convergence rate is upper bounded by that of entire-model compression for a wide range of biased and unbiased compression methods. However, despite the theoretical bound, our experimental study of six well-known methods shows that convergence, in practice, may or may not be better, depending on the actual trained model and compression ratio. Our findings suggest that it would be advantageous for deep learning frameworks to include support for both layer-wise...
more | pdf | html
Figures
None.
Tweets
BrundageBot: On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning. Dutta, Bergou, Abdelmoniem, Ho, Sahu, Canini, and Kalnis https://t.co/7irTdyRZck
Memoirs: On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning. https://t.co/b8CzBsmz5v
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 7874
Unqiue Words: 1991

2.008 Mikeys
#2. The Design and Implementation of a Scalable DL Benchmarking Platform
Cheng Li, Abdul Dakkak, Jinjun Xiong, Wen-mei Hwu
The current Deep Learning (DL) landscape is fast-paced and is rife with non-uniform models, hardware/software (HW/SW) stacks, but lacks a DL benchmarking platform to facilitate evaluation and comparison of DL innovations, be it models, frameworks, libraries, or hardware. Due to the lack of a benchmarking platform, the current practice of evaluating the benefits of proposed DL innovations is both arduous and error-prone - stifling the adoption of the innovations. In this work, we first identify $10$ design features which are desirable within a DL benchmarking platform. These features include: performing the evaluation in a consistent, reproducible, and scalable manner, being framework and hardware agnostic, supporting real-world benchmarking workloads, providing in-depth model execution inspection across the HW/SW stack levels, etc. We then propose MLModelScope, a DL benchmarking platform design that realizes the $10$ objectives. MLModelScope proposes a specification to define DL model evaluations and techniques to provision the...
more | pdf | html
Figures
None.
Tweets
BrundageBot: The Design and Implementation of a Scalable DL Benchmarking Platform. Cheng Li, Abdul Dakkak, Jinjun Xiong, and Wen-mei Hwu https://t.co/TLmgYRr6nk
StatsPapers: The Design and Implementation of a Scalable DL Benchmarking Platform. https://t.co/SHODPkcb47
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 11172
Unqiue Words: 3401

2.001 Mikeys
#3. Distributed Machine Learning through Heterogeneous Edge Systems
Hanpeng Hu, Dan Wang, Chuan Wu
Many emerging AI applications request distributed machine learning (ML) among edge systems (e.g., IoT devices and PCs at the edge of the Internet), where data cannot be uploaded to a central venue for model training, due to their large volumes and/or security/privacy concerns. Edge devices are intrinsically heterogeneous in computing capacity, posing significant challenges to parameter synchronization for parallel training with the parameter server (PS) architecture. This paper proposes ADSP, a parameter synchronization scheme for distributed machine learning (ML) with heterogeneous edge systems. Eliminating the significant waiting time occurring with existing parameter synchronization models, the core idea of ADSP is to let faster edge devices continue training, while committing their model updates at strategically decided intervals. We design algorithms that decide time points for each worker to commit its model update, and ensure not only global model convergence but also faster convergence. Our testbed implementation and...
more | pdf | html
Figures
None.
Tweets
arxiv_in_review: #AAAI2020 Distributed Machine Learning through Heterogeneous Edge Systems. (arXiv:1911.06949v1 [cs\.DC]) https://t.co/0d9ToMXzDs
Memoirs: Distributed Machine Learning through Heterogeneous Edge Systems. https://t.co/DH50DVeplZ
drahmadbazzi: RT @Memoirs: Distributed Machine Learning through Heterogeneous Edge Systems. https://t.co/DH50DVeplZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#4. PES: Proactive Event Scheduling for Responsive and Energy-Efficient Mobile Web Computing
Yu Feng, Yuhao Zhu
Web applications are gradually shifting toward resource-constrained mobile devices. As a result, the Web runtime system must simultaneously address two challenges: responsiveness and energy-efficiency. Conventional Web runtime systems fall short due to their reactive nature: they react to a user event only after it is triggered. The reactive strategy leads to local optimizations that schedule event executions one at a time, missing global optimization opportunities. This paper proposes Proactive Event Scheduling (PES). The key idea of PES is to proactively anticipate future events and thereby globally coordinate scheduling decisions across events. Specifically, PES predicts events that are likely to happen in the near future using a combination of statistical inference and application code analysis. PES then speculatively executes future events ahead of time in a way that satisfies the QoS constraints of all the events while minimizing the global energy consumption. Fundamentally, PES unlocks more optimization opportunities by...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#5. A Spark ML driven preprocessing approach for deep learning based scholarly data applications
Samiya Khan, Xiufeng Liu, Mansaf Alam
Big data has found applications in multiple domains. One of the largest sources of textual big data is scientific documents and papers. Big scholarly data have been used in numerous ways to create innovative applications such as collaborator discovery, expert finding and research management systems. With the advent of advanced machine and deep learning techniques, the accuracy and novelty of such applications have risen manifold. However, the biggest challenge in the development of deep learning models for scholarly applications in cloud based environment is the underutilization of resources because of the excessive time taken by textual preprocessing. This paper presents a preprocessing pipeline that makes use of Spark for data ingestion and Spark ML for pipelining preprocessing tasks. The evaluation of the proposed work is done using a case study, which uses LSTM based text summarization for generating title or summary from abstract of any research. The ingestion, preprocessing and cumulative time for the proposed approach...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#6. Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs
Jung-Woo Chang, Saehyun Ahn, Keon-Woo Kang, Suk-Ju Kang
Generative adversarial networks (GANs) have shown excellent performance in image and speech applications. GANs create impressive data primarily through a new type of operator called deconvolution (DeConv) or transposed convolution (Conv). To implement the DeConv layer in hardware, the state-of-the-art accelerator reduces the high computational complexity via the DeConv-to-Conv conversion and achieves the same results. However, there is a problem that the number of filters increases due to this conversion. Recently, Winograd minimal filtering has been recognized as an effective solution to improve the arithmetic complexity and resource efficiency of the Conv layer. In this paper, we propose an efficient Winograd DeConv accelerator that combines these two orthogonal approaches on FPGAs. Firstly, we introduce a new class of fast algorithm for DeConv layers using Winograd minimal filtering. Since there are regular sparse patterns in Winograd filters, we further amortize the computational complexity by skipping zero weights. Secondly,...
more | pdf | html
Figures
Tweets
arxivml: "Towards Design Methodology of Efficient Fast Algorithms for Accelerating Generative Adversarial Networks on FPGAs"… https://t.co/58Co4lyDVM
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4281
Unqiue Words: 1382

1.998 Mikeys
#7. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Keshav Pingali
There is growing interest in graph mining algorithms such as motif counting. Generic graph mining systems have been developed to provide unified interfaces for programming these algorithms. However, existing systems take minutes or even hours to mine even simple patterns in moderate-sized graphs, which significantly limits their real-world usability. We present Pangolin, a high-performance and flexible in-memory graph mining framework targeting both shared-memory CPUs and GPUs. Pangolin is the first graph mining system that supports GPU processing. We provide a simple embedding-centric programming interface based on the extend-reduce-filter model, which enables user to specify application-specific knowledge like aggressive enumeration search space pruning and isomorphism test elimination. We also describe novel optimizations that exploit locality, reduce memory consumption, and mitigate overheads of dynamic memory allocation and synchronization. Evaluation on a 28-core CPU demonstrates that Pangolin outperforms Arabesque and...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 13392
Unqiue Words: 3644

1.998 Mikeys
#8. Exact Byzantine Consensus on Arbitrary Directed Graphs under Local Broadcast Model
Muhammad Samir Khan, Lewis Tseng, Nitin H. Vaidya
We consider Byzantine consensus in a synchronous system where nodes are connected by a network modeled as a directed graph, i.e., communication links between neighboring nodes are not necessarily bi-directional. The directed graph model is motivated by wireless networks wherein asymmetric communication links can occur. In the classical point-to-point communication model, a message sent on a communication link is private between the two nodes on the link. This allows a Byzantine faulty node to equivocate, i.e., send inconsistent information to its neighbors. This paper considers the local broadcast model of communication, wherein transmission by a node is received identically by all of its outgoing neighbors. This allows such neighbors to detect a faulty node's attempt to equivocate, effectively depriving the faulty nodes of the ability to send conflicting information to different neighbors. Prior work has obtained sufficient and necessary conditions on undirected graphs to be able to achieve Byzantine consensus under the local...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 14160
Unqiue Words: 1945

1.998 Mikeys
#9. A Code injection Method for Rapid Docker Image Building
Yujing Wang, Qinyang Bao
Docker images are built by layers, yet the current implementation has major inefficiencies that makes rebuilding of an image unnecessarily slow when changes in bottom layers are required. In this paper, we propose a code injection method that overcomes these efficiencies by targeting only the changed layer and then bypassing Docker's layer checksum process.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 1873
Unqiue Words: 793

1.996 Mikeys
#10. Decentralization in Open Quorum Systems
Andrea Bracciali, Davide Grossi, Ronald de Haan
Decentralisation is one of the promises introduced by blockchain technologies: fair and secure interaction amongst peers with no dominant positions, single points of failure or censorship. Decentralisation, however, appears difficult to be formally defined, possibly a continuum property of systems that can be more or less decentralised, or can tend to decentralisation in their lifetime. In this paper we focus on decentralisation in quorum-based approaches to open (permissionless) consensus as illustrated in influential protocols such as the Ripple and Stellar protocols. Drawing from game theory and computational complexity, we establish limiting results concerning the decentralisation vs. safety trade-off in Ripple and Stellar, and we propose a novel methodology to formalise and quantitatively analyse decentralisation in this type of blockchains.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 225,737 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 225,737 papers.