Top 8 Arxiv Papers Today in Distributed, Parallel, And Cluster Computing


2.017 Mikeys
#1. Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models
Matthew LeMay, Shijian Li, Tian Guo
Deep learning models are increasingly used for end-user applications, supporting both novel features, such as facial recognition, and traditional features, such as web search. To accommodate high inference throughput, it is common to host a single pre-trained Convolutional Neural Network (CNN) in dedicated cloud-based servers with hardware accelerators such as Graphics Processing Units (GPUs). However, GPUs can be orders of magnitude more expensive than traditional Central Processing Unit (CPU) servers. Under-utilized server resources brought about by dynamic workloads can influence provisioning decisions, which may result in inflated serving costs. One potential way to alleviate this problem is by allowing hosted models to share the underlying resources, which we refer to as multi-tenant inference serving. One of the key challenges is maximizing the resource efficiency for multi-tenant serving given hardware with diverse characteristics, models with unique response time Service Level Agreement (SLA), and dynamic...
more | pdf | html
Figures
None.
Tweets
arxivml: "Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models", Matthew LeMay, Shijian Li, T… https://t.co/fpbVOaGVis
arxiv_cs_LG: Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models. Matthew LeMay, Shijian Li, and Tian Guo https://t.co/L0hjWDxens
Memoirs: Perseus: Characterizing Performance and Cost of Multi-Tenant Serving for CNN Models. https://t.co/QatNxfeKfb
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6273
Unqiue Words: 2093

1.997 Mikeys
#2. ELFISH: Resource-Aware Federated Learning on Heterogeneous Edge Devices
Zirui Xu, Zhao Yang, Jinjun Xiong, Jianlei Yang, Xiang Chen
In this work, we propose ELFISH - a resource-aware federated learning framework to tackle computation stragglers in federated learning. In ELFISH, neural network models' training consumption will be firstly profiled in terms of different computation resources. Guided by profiling, a "soft-training" method is proposed for straggler acceleration, which partially trains the model by masking a particular number of resource-intensive neurons. Rather than generating a deterministically optimized model with diverged structure, different sets of neurons will be dynamically masked every training cycle and will be recovered and updated during parameter aggregation, ensuring comprehensive model updates overtime. The corresponding parameter aggregation scheme is also proposed to balance the contribution from soft-trained models and guarantee the collaborative convergence. Eventually, ELFISH overcomes the computational heterogeneity of edge devices and achieves synchronized collaboration without computational stragglers. Experiments show that...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 5865
Unqiue Words: 1738

1.997 Mikeys
#3. Smart Parking: IoT and Blockchain
Abdul Wahab, Phil Maguire
Distributed ledger technology and IoT has revolutionized the world by finding its application in all the domains. It promises to transform the digital infrastructure which powers extensive evolutions and impacts a lot of areas. Vehicle parking is a major problem in major cities around the world in both developed and developing countries. The common problems are unavailability or shortage of parking spaces, no information about tariff and no mean of searching availability of parking space online. The struggle doesn't end even if an individual finds a spot, he is required to pay in cash. This traditional and manual process takes a lot of time and causes a lot of hassle. In this paper, we provide a novel solution to the parking problem using IoT and distributed ledger technology. This system is based on pervasive computing and provides auto check-in and check-out. The user can control the system and their profile using the app on their smartphone. The major advantage of the system is an easy and online payment method. Users can pay...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#4. SimAS: A Simulation-assisted Approach for the Scheduling Algorithm Selection under Perturbations
Ali Mohammed, Florina M. Ciorba
Many scientific applications consist of large and computationally-intensive loops. Dynamic loop self-scheduling (DLS) techniques are used to parallelize and to balance the load during the execution of such applications. Load imbalance arises from variations in the loop iteration (or tasks) execution times, caused by problem, algorithmic, or systemic characteristics. The variations in systemic characteristics are referred to as perturbations, and can be caused by other applications or processes that share the same resources, or a temporary system fault or malfunction. Therefore, the selection of the most efficient DLS technique is critical to achieve the best application performance. The following question motivates this work: Given an application, an HPC system, and their characteristics and interplay, which DLS technique will achieve improved performance under unpredictable perturbations? Existing studies focus on variations in the delivered computational speed only as the source of perturbations in the system. However,...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

1.997 Mikeys
#5. L3 Fusion: Fast Transformed Convolutions on CPUs
Rati Gelashvili, Nir Shavit, Aleksandar Zlateski
Fast convolutions via transforms, either Winograd or FFT, had emerged as a preferred way of performing the computation of convolutional layers, as it greatly reduces the number of required operations. Recent work shows that, for many layer structures, a well--designed implementation of fast convolutions can greatly utilize modern CPUs, significantly reducing the compute time. However, the generous amount of shared L3 cache present on modern CPUs is often neglected, and the algorithms are optimized solely for the private L2 cache. In this paper we propose an efficient `L3 Fusion` algorithm that is specifically designed for CPUs with significant amount of shared L3 cache. Using the hierarchical roofline model, we show that in many cases, especially for layers with fewer channels, the `L3 fused` approach can greatly outperform standard 3 stage one provided by big vendors such as Intel. We validate our theoretical findings, by benchmarking our `L3 fused` implementation against publicly available state of the art.
more | pdf | html
Figures
Tweets
Underfox3: In this paper, researchers have proposed an efficient algorithm for fast transformed convolutions specifically designed for CPUs with significant amount of shared L3 cache. #MachineLeaning https://t.co/Q1w3Oy7jso https://t.co/ci5mv22WLn
arxivml: "L3 Fusion: Fast Transformed Convolutions on CPUs", Rati Gelashvili, Nir Shavit, Aleksandar Zlateski https://t.co/ngAHuekGfQ
arxiv_cs_LG: L3 Fusion: Fast Transformed Convolutions on CPUs. Rati Gelashvili, Nir Shavit, and Aleksandar Zlateski https://t.co/oyNowyKqXk
Memoirs: L3 Fusion: Fast Transformed Convolutions on CPUs. https://t.co/LAOCkhrykh
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 6428
Unqiue Words: 1836

1.995 Mikeys
#6. GPU Computing with Python: Performance, Energy Efficiency and Usability
Håvard H. Holm, André R. Brodtkorb, Martin L. Sætra
In this work, we examine the performance, energy efficiency and usability when using Python for developing HPC codes running on the GPU. We investigate the portability of performance and energy efficiency between CUDA and OpenCL; between GPU generations; and between low-end, mid-range and high-end GPUs. Our findings show that the impact of using Python is negligible for our applications, and furthermore, CUDA and OpenCL applications tuned to an equivalent level can in many cases obtain the same computational performance. Our experiments show that performance in general varies more between different GPUs than between using CUDA and OpenCL. We also show that tuning for performance is a good way of tuning for energy efficiency, but that specific tuning is needed to obtain optimal energy efficiency.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 12876
Unqiue Words: 3189

1.995 Mikeys
#7. Efficient Deterministic Distributed Coloring with Small Bandwidth
Philipp Bamberger, Fabian Kuhn, Yannic Maus
We show that the $(degree+1)$-list coloring problem can be solved deterministically in $O(D \cdot \log n \cdot\log^3 \Delta)$ in the CONGEST model, where $D$ is the diameter of the graph, $n$ the number of nodes, and $\Delta$ is the maximum degree. Using the network decomposition algorithm from Rozhon and Ghaffari this implies the first efficient deterministic, that is, $\text{poly}\log n$-time, CONGEST algorithm for the $\Delta+1$-coloring and the $(degree+1)$-list coloring problem. Previously the best known algorithm required $2^{O(\sqrt{\log n})}$ rounds and was not based on network decompositions. Our results also imply deterministic $O(\log^3 \Delta)$-round algorithms in MPC and the CONGESTED CLIQUE.
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 15218
Unqiue Words: 2683

1.995 Mikeys
#8. Probabilistic Dynamic Hard Real-Time Scheduling in HPC
Florian Hofer, Martin A. Sehr, Alberto Sangiovanni-Vincentelli, Barbara Russo
Industry 4.0 is changing fundamentally the way data is collected, stored and analyzed in industrial processes, enabling novel application such as flexible manufacturing of highly customized products. Real-time control of these processes, however, has not yet realized its full potential in using the data collected to drive further development. We believe that modern virtualization techniques, specifically application containers, present a unique opportunity to decouple control functionality from associated plants and fully realize the potential for highly distributed and transferable industrial processes even with real-time constraints arising from time-critical sub-processes. In this paper, we explore the challenges and opportunities of shifting industrial control software from dedicated hardware to bare-metal servers or (edge) cloud computing platforms using off-the-shelf technology. We present a specifically developed orchestration tool that can manage the execution of containerized applications on shared resources without...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 1951
Unqiue Words: 994

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 234,442 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 234,442 papers.