### Top 10 Arxiv Papers Today in Performance

##### #1. Sprintz: Time Series Compression for the Internet of Things
###### Davis Blalock, Samuel Madden, John Guttag
Thanks to the rapid proliferation of connected devices, sensor-generated time series constitute a large and growing portion of the world's data. Often, this data is collected from distributed, resource-constrained devices and centralized at one or more servers. A key challenge in this setup is reducing the size of the transmitted data without sacrificing its quality. Lower quality reduces the data's utility, but smaller size enables both reduced network and storage costs at the servers and reduced power consumption in sensing devices. A natural solution is to compress the data at the sensing devices. Unfortunately, existing compression algorithms either violate the memory and latency constraints common for these devices or, as we show experimentally, perform poorly on sensor-generated time series. We introduce a time series compression algorithm that achieves state-of-the-art compression ratios while requiring less than 1KB of memory and adding virtually no latency. This method is suitable not only for low-power devices...
more | pdf | html
###### Tweets
ComputerPapers: Sprintz: Time Series Compression for the Internet of Things. https://t.co/w2Vhm2XPis
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11888
Unqiue Words: 3444

##### #2. MARS: Memory Aware Reordered Source
###### Ishwar Bhati, Udit Dhawan, Jayesh Gaur, Sreenivas Subramoney, Hong Wang
Memory bandwidth is critical in today's high performance computing systems. The bandwidth is particularly paramount for GPU workloads such as 3D Gaming, Imaging and Perceptual Computing, GPGPU due to their data-intensive nature. As the number of threads and data streams in the GPUs increases with each generation, along with a high available memory bandwidth, memory efficiency is also crucial in order to achieve desired performance. In presence of multiple concurrent data streams, the inherent locality in a single data stream is often lost as these streams are interleaved while moving through multiple levels of memory system. In DRAM based main memory, the poor request locality reduces row-buffer reuse resulting in underutilized and inefficient memory bandwidth. In this paper we propose Memory-Aware Reordered Source (\textit{MARS}) architecture to address memory inefficiency arising from highly interleaved data streams. The key idea of \textit{MARS} is that with a sufficiently large lookahead before the main memory, data streams...
more | pdf | html
None.
###### Tweets
M157q_News_RSS: MARS: Memory Aware Reordered Source. (arXiv:1808.03518v1 [https://t.co/iN1HmYxuOB]) https://t.co/bz31i4V29O Memory bandwidth is critical in today's high perfo
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 5
Total Words: 3542
Unqiue Words: 1395

##### #3. Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering
###### Ricardo Nobre, Luís Reis, João M. P. Cardoso
Automatic compiler phase selection/ordering has traditionally been focused on CPUs and, to a lesser extent, FPGAs. We present experiments regarding compiler phase ordering specialization of OpenCL kernels targeting a GPU. We use iterative exploration to specialize LLVM phase orders on 15 OpenCL benchmarks to an NVIDIA GPU. We analyze the generated NVIDIA PTX code for the various versions to identify the main causes of the most significant improvements and present results of a set of experiments that demonstrate the importance of using specific phase orders. Using specialized compiler phase orders, we were able to achieve geometric mean improvements of 1.54x (up to 5.48x) and 1.65x (up to 5.7x) over PTX generated by the NVIDIA CUDA compiler from CUDA versions of the same kernels, and over execution of the OpenCL kernels compiled from source with the NVIDIA OpenCL driver, respectively. We also evaluate the use of code-features in the OpenCL kernels. More specifically, we evaluate an approach that achieves geometric mean improvements...
more | pdf | html
###### Tweets
ComputerPapers: Improving OpenCL Performance by Specializing Compiler Phase Selection and Ordering. https://t.co/ofoC2I1qAn
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 11304
Unqiue Words: 2618

##### #4. Performance analysis and optimization of the JOREK code for many-core CPUs
###### T. B. Fehér, M. Hölzl, G. Latu, G. T. A. Huijsmans
This report investigates the performance of the JOREK code on the Intel Knights Landing and Skylake processor architectures. The OpenMP scaling of the matrix construction part of the code was analyzed and improved synchronization methods were implemented. A new switch was implemented to control the number of threads used for the linear equation solver independently from other parts of the code. The matrix construction subroutine was vectorized, and the data locality was also improved. These steps led to a factor of two speedup for the matrix construction.
more | pdf | html
###### Tweets
ComputerPapers: Performance analysis and optimization of the JOREK code for many-core CPUs. https://t.co/fRnYxBJVzH
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5107
Unqiue Words: 1450

##### #5. Design and optimisation of an efficient HDF5 I/O kernel for massive parallel fluid flow simulations
###### Christoph Ertl, Jérôme Frisch, Ralf-Peter Mundani
More and more massive parallel codes running on several hundreds of thousands of cores enter the computational science and engineering domain, allowing high-fidelity computations on up to trillions of unknowns for very detailed analyses of the underlying problems. During such runs, typically gigabytes of data are being produced, hindering both efficient storage and (interactive) data exploration. Here, advanced approaches based on inherently distributed data formats such as HDF5 become necessary in order to avoid long latencies when storing the data and to support fast (random) access when retrieving the data for visual processing. Avoiding file locking and using collective buffering, write bandwidths to a single file close to the theoretical peak on a modern supercomputing cluster were achieved. The structure of the output file supports a very fast interactive visualisation and introduces additional steering functionality.
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 8187
Unqiue Words: 2529

##### #6. A Queuing Model for CPU Functional Unit and Issue Queue Configuration
###### Shane Carroll, Wei-Ming Ling
In a superscalar processor, instructions of various types flow through an execution pipeline, traversing hardware resources which are mostly shared among many different instruction types. A notable exception to shared pipeline resources is the collection of functional units, the hardware that performs specific computations. In a trade-off of cost versus performance, a pipeline designer must decide how many of each type of functional unit to place in a processor's pipeline. In this paper, we model a superscalar processor's issue queue and functional units as a novel queuing network. We treat the issue queue as a finite-sized waiting area and the functional units as servers. In addition to common queuing problems, customers of the network share the queue but wait for specific servers to become ready (e.g., addition instructions wait for adders). Furthermore, the customers in this queue are not necessary ready for service, since instructions may be waiting for operands. In this paper we model a novel queuing network that provides a...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 2
Total Words: 10537
Unqiue Words: 2294

##### #7. Time-efficient Garbage Collection in SSDs
###### Lars Nagel, Tim Süß, Kevin Kremer, M. Umar Hameed, Lingfang Zeng, André Brinkmann
SSDs are currently replacing magnetic disks in many application areas. A challenge of the underlying flash technology is that data cannot be updated in-place. A block consisting of many pages must be completely erased before a single page can be rewritten. This victim block can still contain valid pages which need to be copied to other blocks before erasure. The objective of garbage collection strategies is to minimize write amplification induced by copying valid pages from victim blocks while minimizing the performance overhead of the victim selection. Victim selection strategies minimizing write amplification, like the cost-benefit approach, have linear runtime, while the write amplifications of time-efficient strategies, like the greedy strategy, significantly reduce the lifetime of SSDs. In this paper, we propose two strategies which optimize the performance of cost-benefit, while (almost) preserving its write amplification. Trace-driven simulations for single- and multi-channel SSDs show that the optimizations help to keep...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 6
Total Words: 8891
Unqiue Words: 2273

##### #8. Average age of coupon type and related average age of information
###### George Kesidis
We consider two types of problems associated with a bufferless server with service preemption (push-out). One type is motivated by "adversarial" coupon collection with applications to modeling the surveillance of a set of servers by a botnet planning a DDoS attack on them collectively. The servers dynamically change according to a moving-target defense. Another type of problem has to do with a sequence of messages handled by the (transmission) server wherein each message obsoletes all previous ones. The objective is to assess the freshness of the latest message/information that has been successfully transmitted, i.e., "age of information".
more | pdf | html
###### Tweets
ComputerPapers: Average age of coupon type and related average age of information. https://t.co/wvgNDzaUyg
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 1
Total Words: 3923
Unqiue Words: 1098

##### #9. A Stochastic Model for File Lifetime and Security in Data Center Networks
###### Quan-Lin Li, Fan-Qi Ma, Jing-Yu Ma
Data center networks are an important infrastructure in various applications of modern information technologies. Note that each data center always has a finite lifetime, thus once a data center fails, then it will lose all its storage files and useful information. For this, it is necessary to replicate and copy each important file into other data centers such that this file can increase its lifetime of staying in a data center network. In this paper, we describe a large-scale data center network with a file d-threshold policy, which is to replicate each important file into at most d-1 other data centers such that this file can maintain in the data center network under a given level of data security in the long-term. To this end, we develop three relevant Markov processes to propose two effective methods for assessing the file lifetime and data security. By using the RG-factorizations, we show that the two methods are used to be able to more effectively evaluate the file lifetime of large-scale data center networks. We hope the...
more | pdf | html
None.
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 4463
Unqiue Words: 1118

##### #10. A refined mean field approximation of synchronous discrete-time population models
###### Nicolas Gast, Diego Latella, Mieke Massink
Mean field approximation is a popular method to study the behaviour of stochastic models composed of a large number of interacting objects. When the objects are asynchronous, the mean field approximation of a population model can be expressed as an ordinary differential equation. When the objects are (clock-) synchronous the mean field approximation is a discrete time dynamical system. We focus on the latter.We study the accuracy of mean field approximation when this approximation is a discrete-time dynamical system. We extend a result that was shown for the continuous time case and we prove that expected performance indicators estimated by mean field approximation are $O(1/N)$-accurate. We provide simple expressions to effectively compute the asymptotic error of mean field approximation, for finite time-horizon and steady-state, and we use this computed error to propose what we call a \emph{refined} mean field approximation. We show, by using a few numerical examples, that this technique improves the quality of approximation...
more | pdf | html
###### Github

Paper "A Refined Mean Field Approximation for Synchronous Population Processes"

Repository: RefinedMeanField_SynchronousPopulation
User: ngast
Language: Jupyter Notebook
Stargazers: 0
Subscribers: 0
Forks: 0
Open Issues: 0
None.
###### Other stats
Sample Sizes : None.
Authors: 3
Total Words: 12733
Unqiue Words: 2703

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,995 papers.

###### Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Online
###### Stats
Tracking 72,995 papers.