Top 10 Arxiv Papers Today in Performance


0.0 Mikeys
#1. Sprintz: Time Series Compression for the Internet of Things
Davis Blalock, Samuel Madden, John Guttag
Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series. We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices...
more | pdf | html
Figures
Tweets
ComputerPapers: Sprintz: Time Series Compression for the Internet of Things. https://t.co/w2Vhm2XPis
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11888
Unqiue Words: 3444

0.0 Mikeys
#2. MARS: Memory Aware Reordered Source
Ishwar Bhati, Udit Dhawan, Jayesh Gaur, Sreenivas Subramoney, Hong Wang
Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases with each generation, along with a high available memory bandwidth, memory efficiency is also crucial in order to achieve desired performance. In presence of multiple concurrent data streams, the inherent locality in a single data stream is often lost as these streams are interleaved while moving through multiple levels of memory system. In DRAM based main memory, the poor request locality reduces row-buffer reuse resulting in underutilized and inefficient memory bandwidth. In this paper we propose Memory-Aware Reordered Source (\textit{MARS}) architecture to address memory inefficiency arising from highly interleaved data streams. The key idea of \textit{MARS} is that with a sufficiently large lookahead before the main memory, data streams...
more | pdf | html
Figures
None.
Tweets
M157q_News_RSS: MARS: Memory Aware Reordered Source. (arXiv:1808.03518v1 [https://t.co/iN1HmYxuOB]) https://t.co/bz31i4V29O Memory bandwidth is critical in today's high perfo
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 3542
Unqiue Words: 1395

0.0 Mikeys
#3. Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering
Ricardo Nobre, Luís Reis, João M. P. Cardoso
Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated NVIDIA PTX code for the various versions to identify the main causes of the most significant improvements and present results of a set of experiments that demonstrate the importance of using specific phase orders. Using specialized compiler phase orders, we were able to achieve geometric mean improvements of 1.54x (up to 5.48x) and 1.65x (up to 5.7x) over PTX generated by the NVIDIA CUDA compiler from CUDA versions of the same kernels, and over execution of the OpenCL kernels compiled from source with the NVIDIA OpenCL driver, respectively. We also evaluate the use of code-features in the OpenCL kernels. More specifically, we evaluate an approach that achieves geometric mean improvements...
more | pdf | html
Figures
Tweets
ComputerPapers: Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering. https://t.co/ofoC2I1qAn
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11304
Unqiue Words: 2618

0.0 Mikeys
#4. Performance analysis and optimization of the JOREK code for many-core CPUs
T. B. Fehér, M. Hölzl, G. Latu, G. T. A. Huijsmans
This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction.
more | pdf | html
Figures
Tweets
ComputerPapers: Performance analysis and optimization of the JOREK code for many-core CPUs. https://t.co/fRnYxBJVzH
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5107
Unqiue Words: 1450

0.0 Mikeys
#5. Design and optimisation of an efficient HDF5 I/O kernel for massive parallel fluid flow simulations
Christoph Ertl, Jérôme Frisch, Ralf-Peter Mundani
More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of the underlying problems. During such runs, typically gigabytes of data are being produced, hindering both efficient storage and (interactive) data exploration. Here, advanced approaches based on inherently distributed data formats such as HDF5 become necessary in order to avoid long latencies when storing the data and to support fast (random) access when retrieving the data for visual processing. Avoiding file locking and using collective buffering, write bandwidths to a single file close to the theoretical peak on a modern supercomputing cluster were achieved. The structure of the output file supports a very fast interactive visualisation and introduces additional steering functionality.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8187
Unqiue Words: 2529

0.0 Mikeys
#6. A Queuing Model for CPU Functional Unit and Issue Queue Configuration
Shane Carroll, Wei-Ming Ling
In a superscalar processor, instructions of various types flow through an execution pipeline, traversing hardware resources which are mostly shared among many different instruction types. A notable exception to shared pipeline resources is the collection of functional units, the hardware that performs specific computations. In a trade-off of cost versus performance, a pipeline designer must decide how many of each type of functional unit to place in a processor's pipeline. In this paper, we model a superscalar processor's issue queue and functional units as a novel queuing network. We treat the issue queue as a finite-sized waiting area and the functional units as servers. In addition to common queuing problems, customers of the network share the queue but wait for specific servers to become ready (e.g., addition instructions wait for adders). Furthermore, the customers in this queue are not necessary ready for service, since instructions may be waiting for operands. In this paper we model a novel queuing network that provides a...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 10537
Unqiue Words: 2294

0.0 Mikeys
#7. Time-efficient Garbage Collection in SSDs
Lars Nagel, Tim Süß, Kevin Kremer, M. Umar Hameed, Lingfang Zeng, André Brinkmann
SSDs are currently replacing magnetic disks in many application areas. A challenge of the underlying flash technology is that data cannot be updated in-place. A block consisting of many pages must be completely erased before a single page can be rewritten. This victim block can still contain valid pages which need to be copied to other blocks before erasure. The objective of garbage collection strategies is to minimize write amplification induced by copying valid pages from victim blocks while minimizing the performance overhead of the victim selection. Victim selection strategies minimizing write amplification, like the cost-benefit approach, have linear runtime, while the write amplifications of time-efficient strategies, like the greedy strategy, significantly reduce the lifetime of SSDs. In this paper, we propose two strategies which optimize the performance of cost-benefit, while (almost) preserving its write amplification. Trace-driven simulations for single- and multi-channel SSDs show that the optimizations help to keep...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 8891
Unqiue Words: 2273

0.0 Mikeys
#8. Average age of coupon type and related average age of information
George Kesidis
We consider two types of problems associated with a bufferless server with service preemption (push-out). One type is motivated by "adversarial" coupon collection with applications to modeling the surveillance of a set of servers by a botnet planning a DDoS attack on them collectively. The servers dynamically change according to a moving-target defense. Another type of problem has to do with a sequence of messages handled by the (transmission) server wherein each message obsoletes all previous ones. The objective is to assess the freshness of the latest message/information that has been successfully transmitted, i.e., "age of information".
more | pdf | html
Figures
Tweets
ComputerPapers: Average age of coupon type and related average age of information. https://t.co/wvgNDzaUyg
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 3923
Unqiue Words: 1098

0.0 Mikeys
#9. A Stochastic Model for File Lifetime and Security in Data Center Networks
Quan-Lin Li, Fan-Qi Ma, Jing-Yu Ma
Data center networks are an important infrastructure in various applications of modern information technologies. Note that each data center always has a finite lifetime, thus once a data center fails, then it will lose all its storage files and useful information. For this, it is necessary to replicate and copy each important file into other data centers such that this file can increase its lifetime of staying in a data center network. In this paper, we describe a large-scale data center network with a file d-threshold policy, which is to replicate each important file into at most d-1 other data centers such that this file can maintain in the data center network under a given level of data security in the long-term. To this end, we develop three relevant Markov processes to propose two effective methods for assessing the file lifetime and data security. By using the RG-factorizations, we show that the two methods are used to be able to more effectively evaluate the file lifetime of large-scale data center networks. We hope the...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4463
Unqiue Words: 1118

0.0 Mikeys
#10. A refined mean field approximation of synchronous discrete-time population models
Nicolas Gast, Diego Latella, Mieke Massink
Mean field approximation is a popular method to study the behaviour of stochastic models composed of a large number of interacting objects. When the objects are asynchronous, the mean field approximation of a population model can be expressed as an ordinary differential equation. When the objects are (clock-) synchronous the mean field approximation is a discrete time dynamical system. We focus on the latter.We study the accuracy of mean field approximation when this approximation is a discrete-time dynamical system. We extend a result that was shown for the continuous time case and we prove that expected performance indicators estimated by mean field approximation are $O(1/N)$-accurate. We provide simple expressions to effectively compute the asymptotic error of mean field approximation, for finite time-horizon and steady-state, and we use this computed error to propose what we call a \emph{refined} mean field approximation. We show, by using a few numerical examples, that this technique improves the quality of approximation...
more | pdf | html
Figures
Tweets
Github

Paper "A Refined Mean Field Approximation for Synchronous Population Processes"

Repository: RefinedMeanField_SynchronousPopulation
User: ngast
Language: Jupyter Notebook
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 12733
Unqiue Words: 2703

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,995 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 72,995 papers.