Top 10 Arxiv Papers Today in Distributed, Parallel, And Cluster Computing


1.997 Mikeys
#1. Auto-tuning of dynamic load balancing applied to 3D reverse time migration on multicore systems
Ítalo A. S. Assis, João B. Fernandes, Tiago Barros, Samuel Xavier-de-Souza
Reverse time migration (RTM) is an algorithm widely used in the oil and gas industry to process seismic data. It is a computationally intensive task that suits well in parallel computers. Because of it being massive and regular, this type of task is often equally and statically distributed among the available parallel workers. However, this strategy is often not optimal. When the workers are heterogeneous, and even when most have similar processing power, many of them might still have to wait idly for the slower workers. In this paper, we show that even small performance differences between homogeneous cores can considerably affect the overall performance of a 3D RTM application. We show that dynamic load distribution has a significant advantage over the conventional static distribution. However, the granularity of the dynamically distributed chunks of work plays a key role in harvesting this advantage. In order to find the optimal granularity, we propose a coupled simulated annealing (CSA) based auto-tuning strategy that adjusts...
more | pdf | html
Figures
None.
Tweets
arxiv_org: Auto-tuning of dynamic load balancing applied to 3D reverse time migration on multicore s... https://t.co/x6Iq7UvS5u https://t.co/lYwGp8iJba
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 7049
Unqiue Words: 2077

1.997 Mikeys
#2. On the complexity of fault-tolerant consensus
Dariusz R. Kowalski, Jaroslaw Mirek
The paper studies the problem of reaching agreement in a distributed message-passing system prone to crash failures. Crashes are generated by \constrained\ adversaries - a \wadapt\ adversary, who has to fix in advance the set of $f$ crash-prone processes, or a \chainadapt\ adversary, who orders all the processes into $k$ disjoint chains and has to follow this pattern when crashing them. Apart from these constraints, both of them may crash processes in an adaptive way at any time. While commonly used \sadapt\ adversaries model attacks and \noadapt\ ones -- pre-defined faults, the constrained adversaries model more realistic scenarios when there are fault-prone dependent processes, e.g., in hierarchical or dependable software/hardware systems. We propose time-efficient consensus algorithms against such adversaries and also show how to improve the message complexity of proposed solutions. Finally, we show how to reach consensus against a \kthick\ adversary, limited by an arbitrary partial order \dk{with a maximal anti-chain of length...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 10052
Unqiue Words: 2201

1.995 Mikeys
#3. Custom Execution Environments with Containers in Pegasus-enabled Scientific Workflows
Karan Vahi, Mats Rynge, George Papadimitriou, Duncan A. Brown, Rajiv Mayani, Rafael Ferreira da Silva, Ewa Deelman, Anirban Mandal, Eric Lyons, Michael Zink
Science reproducibility is a cornerstone feature in scientific workflows. In most cases, this has been implemented as a way to exactly reproduce the computational steps taken to reach the final results. While these steps are often completely described, including the input parameters, datasets, and codes, the environment in which these steps are executed is only described at a higher level with endpoints and operating system name and versions. Though this may be sufficient for reproducibility in the short term, systems evolve and are replaced over time, breaking the underlying workflow reproducibility. A natural solution to this problem is containers, as they are well defined, have a lifetime independent of the underlying system, and can be user-controlled so that they can provide custom environments if needed. This paper highlights some unique challenges that may arise when using containers in distributed scientific workflows. Further, this paper explores how the Pegasus Workflow Management System implements container support to...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 10
Total Words: 8185
Unqiue Words: 2405

1.995 Mikeys
#4. Broadcast Congested Clique: Planted Cliques and Pseudorandom Generators
Lijie Chen, Ofer Grossman
We develop techniques to prove lower bounds for the BCAST(log n) Broadcast Congested Clique model (a distributed message passing model where in each round, each processor can broadcast an O(log n)-sized message to all other processors). Our techniques are built to prove bounds for natural input distributions. So far, all lower bounds for problems in the model relied on constructing specifically tailored graph families for the specific problem at hand, resulting in lower bounds for artificially constructed inputs, instead of natural input distributions. One of our results is a lower bound for the directed planted clique problem. In this problem, an input graph is either a random directed graph (each directed edge is included with probability 1/2), or a random graph with a planted clique of size k. That is, k randomly chosen vertices have all of the edges between them included, and all other edges in the graph appear with probability 1/2. The goal is to determine whether a clique exists. We show that when k = n^(1/4 - eps), this...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 22767
Unqiue Words: 3436

1.995 Mikeys
#5. Online Research Report: rDLB: A Novel Approach for Robust Dynamic Load Balancing of Scientific Applications with Parallel Independent Tasks
Ali Mohammed, Aurelien Cavelan, Florina M. Ciorba
Scientific applications often contain large and computationally intensive parallel loops. Dynamic loop self scheduling (DLS) is used to achieve a balanced load execution of such applications on high performance computing (HPC) systems. Large HPC systems are vulnerable to processors or node failures and perturbations in the availability of resources. Most self-scheduling approaches do not consider fault-tolerant scheduling or depend on failure or perturbation detection and react by rescheduling failed tasks. In this work, a robust dynamic load balancing (rDLB) approach is proposed for the robust self scheduling of independent tasks. The proposed approach is proactive and does not depend on failure or perturbation detection. The theoretical analysis of the proposed approach shows that it is linearly scalable and its cost decrease quadratically by increasing the system size. rDLB is integrated into an MPI DLS library to evaluate its performance experimentally with two computationally intensive scientific applications. Results show...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8495
Unqiue Words: 2063

1.995 Mikeys
#6. Hyaline: Fast and Transparent Lock-Free Memory Reclamation
Ruslan Nikolaev, Binoy Ravindran
We present a new lock-free safe memory reclamation algorithm, Hyaline, which is fast, scalable, and transparent to the underlying data structures. Due to very low contention, Hyaline is generally faster than existing approaches, including epoch-based reclamation. Moreover, Hyaline easily handles virtually unbounded number of threads (or any concurrent entities) that can be created and deleted dynamically, while retaining O(1) reclamation cost. Hyaline also does not depend on any OS abstractions or compiler support. Hyaline's full transparency and simple API enable easy integration into unmanaged C/C++ code, while not causing any undue burden on programmers, potentially making the entire process fully automatic through special language constructs. We also extend Hyaline to avoid situations where stalled threads prevent others from reclaiming newly allocated objects, a common problem with epoch-based reclamation. We have implemented and tested Hyaline on the x86(-64), ARM32/64, PPC, and MIPS architectures. The general approach...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 12548
Unqiue Words: 3001

1.995 Mikeys
#7. Massively Parallel Computation via Remote Memory Access
Soheil Behnezhad, Laxman Dhulipala, Hossein Esfandiari, Jakub Łącki, Warren Schudy, Vahab Mirrokni
We introduce the Adaptive Massively Parallel Computation (AMPC) model, which is an extension of the Massively Parallel Computation (MPC) model. At a high level, the AMPC model strengthens the MPC model by storing all messages sent within a round in a distributed data store. In the following round, all machines are provided with random read access to the data store, subject to the same constraints on the total amount of communication as in the MPC model. Our model is inspired by the previous empirical studies of distributed graph algorithms using MapReduce and a distributed hash table service. This extension allows us to give new graph algorithms with much lower round complexities compared to the best known solutions in the MPC model. In particular, in the AMPC model we show how to solve maximal independent set in $O(1)$ rounds and connectivity/minimum spanning tree in $O(\log\log_{m/n} n)$ rounds both using $O(n^\delta)$ space per machine for constant $\delta < 1$. In the same memory regime for MPC, the best known algorithms for...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 0
Unqiue Words: 0

1.995 Mikeys
#8. Locally Self-Adjusting Hypercubic Networks
Sikder Huq, Sukumar Ghosh
In a prior work (ICDCS 2017), we presented a distributed self-adjusting algorithm DSG for skip graphs. DSG performs topological adaption to communication pattern to minimize the average routing costs between communicating nodes. In this work, we present a distributed self-adjusting algorithm (referred to as DyHypes) for topological adaption in hypercubic networks. One of the major differences between hypercubes and skip graphs is that hypercubes are more rigid in structure compared skip graphs. This property makes self-adjustment significantly different in hypercubic networks than skip graphs. Upon a communication between an arbitrary pair of nodes, DyHypes transforms the network to place frequently communicating nodes closer to each other to maximize communication efficiency, and uses randomization in the transformation process to speed up the transformation and reduce message complexity. We show that, as compared to DSG, DyHypes reduces the transformation cost by a factor of $O(\log n)$, where $n$ is the number of nodes involved...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 13487
Unqiue Words: 2031

1.995 Mikeys
#9. Distributed Algorithms for Subgraph-Centric Graph Platforms
Diptanshu Kakwani, Yogesh Simmhan
Graph analytics for large scale graphs has gained interest in recent years. Many graph algorithms have been designed for vertex-centric distributed graph processing frameworks to operate on large graphs with 100 M vertices and edges, using commodity clusters and Clouds. Subgraph-centric programming models have shown additional performance benefits than vertex-centric models. But direct mapping of vertex-centric and shared-memory algorithms to subgraph-centric frameworks are either not possible, or lead to inefficient algorithms. In this paper, we present three subgraph-centric distributed graph algorithms for triangle counting, clustering and minimum spanning forest, using variations of shared and vertex-centric models. These augment existing subgraph-centric algorithms that exist in literature, and allow a broader evaluation of these three classes of graph processing algorithms and platforms.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

1.978 Mikeys
#10. Instructions' Latencies Characterization for NVIDIA GPGPUs
Yehia Arafa, Abdel-Hameed Badawy, Gopinath Chennupati, Nandakishore Santhi, Stephan Eidenbenz
The last decade has seen a shift in the computer systems industry where heterogeneous computing has become prevalent. Nowadays, Graphics Processing Units (GPUs) are in a variety of systems from supercomputers to mobile phones and tablets. They are not only used for graphics operations but rather as general-purpose special hardware (GPGPUs) to boost the performance of compute-intensive applications. However, the percentage of undisclosed characteristics beyond what vendors provide is small. In this paper, we propose a very low overhead and portable analysis for exposing the hidden latency of each individual instruction executing in the pipeline and different access latencies of the various memory hierarchies at the microarchitecture level. We also show the impact of the possible optimizations a CUDA compiler have over the various latencies. We run our evaluation on seven different high-end NVIDIA GPUs from five different generations/architectures namely: Kepler, Maxwell, Pascal, Volta, and Turing. We believe that this work would...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 129,963 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 129,963 papers.