Top 10 Arxiv Papers Today in Hardware Architecture


0.0 Mikeys
#1. Memory Vulnerability: A Case for Delaying Error Reporting
Luc Jaulmes, Miquel Moretó, Mateo Valero, Marc Casas
To face future reliability challenges, it is necessary to quantify the risk of error in any part of a computing system. To this goal, the Architectural Vulnerability Factor (AVF) has long been used for chips. However, this metric is used for offline characterisation, which is inappropriate for memory. We survey the literature and formalise one of the metrics used, the Memory Vulnerability Factor, and extend it to take into account false errors. These are reported errors which would have no impact on the program if they were ignored. We measure the False Error Aware MVF (FEA) and related metrics precisely in a cycle-accurate simulator, and compare them with the effects of injecting faults in a program's data, in native parallel runs. Our findings show that MVF and FEA are the only two metrics that are safe to use at runtime, as they both consistently give an upper bound on the probability of incorrect program outcome. FEA gives a tighter bound than MVF, and is the metric that correlates best with the incorrect outcome probability...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 4661
Unqiue Words: 1585

0.0 Mikeys
#2. Implications of Integrated CPU-GPU Processors on Thermal and Power Management Techniques
Kapil Dev, Indrani Paul, Wei Huang, Yasuko Eckert, Wayne Burleson, Sherief Reda
Heterogeneous processors with architecturally different cores (CPU and GPU) integrated on the same die lead to new challenges and opportunities for thermal and power management techniques because of shared thermal/power budgets between these cores. In this paper, we show that new parallel programming paradigms (e.g., OpenCL) for CPU-GPU processors create a tighter coupling between the workload, the thermal/power management unit and the operating system. Using detailed thermal and power maps of the die from infrared imaging, we demonstrate that in contrast to traditional multi-core CPUs, heterogeneous processors exhibit higher coupled behavior for dynamic voltage and frequency scaling and workload scheduling, in terms of their effect on performance, power, and temperature. Further, we show that by taking the differences in core architectures and relative proximity of different computing cores on the die into consideration, better scheduling schemes could be implemented to reduce both the power density and peak temperature of the...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 7722
Unqiue Words: 2154

0.0 Mikeys
#3. Is Leakage Power a Linear Function of Temperature?
Hameedah Sultan, Shashank Varshney, Smruti R Sarangi
In this work, we present a study of the leakage power modeling techniques commonly used in the architecture community. We further provide an analysis of the error in leakage power estimation using the various modeling techniques. We strongly believe that this study will help researchers determine an appropriate leakage model to use in their work, based on the desired modeling accuracy and speed.
more | pdf | html
Figures
None.
Tweets
ComputerPapers: Is Leakage Power a Linear Function of Temperature?. https://t.co/n0MMLsVWT7
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 2719
Unqiue Words: 891

0.0 Mikeys
#4. Deriving AOC C-Models from D&V Languages for Single- or Multi-Threaded Execution Using C or C++
Tobias Strauch
The C language is getting more and more popular as a design and verification language (DVL). SystemC, ParC [1] and Cx [2] are based on C. C-models of the design and verification environment can also be generated from new DVLs (e.g. Chisel [3]) or classical DVLs such as VHDL or Verilog. The execution of these models is usually license free and presumably faster than their alternative counterparts (simulators). This paper proposes activity-dependent, ordered, cycle-accurate (AOC) C-models to speed up simulation time. It compares the results with alternative concepts. The paper also examines the execution of the AOC C-model on a multithreaded processor environment.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 4710
Unqiue Words: 1446

0.0 Mikeys
#5. Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach
Yuzhe Ma, Subhendu Roy, Jin Miao, Jiamin Chen, Bei Yu
In spite of maturity to the modern electronic design automation (EDA) tools, optimized designs at architectural stage may become sub-optimal after going through physical design flow. Adder design has been such a long studied fundamental problem in VLSI industry yet designers cannot achieve optimal solutions by running EDA tools on the set of available prefix adder architectures. In this paper, we enhance a state-of-the-art prefix adder synthesis algorithm to obtain a much wider solution space in architectural domain. On top of that, a machine learning-based design space exploration methodology is applied to predict the Pareto frontier of the adders in physical domain, which is infeasible by exhaustively running EDA tools for innumerable architectural solutions. Considering the high cost of obtaining the true values for learning, an active learning algorithm is utilized to select the representative data during learning process, which uses less labeled data while achieving better quality of Pareto frontier. Experimental results...
more | pdf | html
Figures
Tweets
nmfeeds: [O] https://t.co/l1WV3SdbVF Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach. In ...
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 11792
Unqiue Words: 2890

0.0 Mikeys
#6. On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option?
Mohamed Hassan
Predictable execution time upon accessing shared memories in multi-core real-time systems is a stringent requirement. A plethora of existing works focus on the analysis of Double Data Rate Dynamic Random Access Memories (DDR DRAMs), or redesigning its memory to provide predictable memory behavior. In this paper, we show that DDR DRAMs by construction suffer inherent limitations associated with achieving such predictability. These limitations lead to 1) highly variable access latencies that fluctuate based on various factors such as access patterns and memory state from previous accesses, and 2) overly pessimistic latency bounds. As a result, DDR DRAMs can be ill-suited for some real-time systems that mandate a strict predictable performance with tight timing constraints. Targeting these systems, we promote an alternative off-chip memory solution that is based on the emerging Reduced Latency DRAM (RLDRAM) protocol, and propose a predictable memory controller (RLDC) managing accesses to this memory. Comparing with the...
more | pdf | html
Figures
Tweets
ComputerPapers: On the Off-chip Memory Latency of Real-Time Systems: Is DDR DRAM Really the Best Option?. https://t.co/a9HYKRpKqh
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 8815
Unqiue Words: 2207

0.0 Mikeys
#7. An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation
Atin Mukherjee, Debesh Choudhury
Two-dimensional Fourier transform plays a significant role in a variety of image processing problems, such as medical image processing, digital holography, correlation pattern recognition, hybrid digital optical processing, optical computing etc. 2D spatial Fourier transformation involves large number of image samples and hence it requires huge hardware resources of field programmable gate arrays (FPGA). In this paper, we present an area efficient architecture of 2D FFT processor that reuses the butterfly units multiple times. This is achieved by using a control unit that sends back the previous computed data of N/2 butterfly units to itself for {log_2(N) - 1} times. A RAM controller is used to synchronize the flow of data samples between the functional blocks.The 2D FFT processor is simulated by VHDL and the results are verified on a Virtex-6 FPGA. The proposed method outperforms the conventional NxN point 2D FFT in terms of area which is reduced by a factor of log_N(2) with negligible increase in computation time.
more | pdf | html
Figures
Tweets
ComputerPapers: An Area Efficient 2D Fourier Transform Architecture for FPGA Implementation. https://t.co/4KK8zi6MWw
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 3280
Unqiue Words: 1065

0.0 Mikeys
#8. Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine
Andreas Kurth, Pirmin Vogel, Andrea Marongiu, Luca Benini
Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoCs), which combine a general-purpose host processor with a many-core accelerator, both for programmability and to avoid data duplication. However, SVM can bring a significant run time overhead when translation lookaside buffer (TLB) entries are missing. Moreover, allowing DMA burst transfers to write SVM traditionally requires buffers to absorb transfers that miss in the TLB. These buffers have to be overprovisioned for the maximum burst size, wasting precious on-chip memory, and stall all SVM accesses once they are full, hampering the scalability of parallel accelerators. In this work, we present our SVM solution that avoids the majority of TLB misses with prefetching, supports parallel burst DMA transfers without additional buffers, and can be scaled with the workload and number of parallel processors. Our solution is based on three novel concepts: To minimize the rate of TLB misses, the TLB is proactively filled by compiler-generated Prefetching Helper...
more | pdf | html
Figures
Tweets
ComputerPapers: Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine. https://t.co/MnxFa7yN7h
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 8948
Unqiue Words: 2487

0.0 Mikeys
#9. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study
Saugata Ghose, Abdullah Giray Yağlıçkı, Raghav Gupta, Donghyuk Lee, Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin K. Chang, Niladrish Chatterjee, Aditya Agrawal, Mike O'Connor, Onur Mutlu
Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate, and do not reflect the actual power consumed by real DRAM devices. We perform the first comprehensive experimental characterization of the power consumed by modern real-world DRAM modules. Our extensive characterization of 50 DDR3L DRAM modules from three major vendors yields four key new observations about DRAM power consumption: (1) across all IDD values that we measure, the current consumed by real DRAM modules varies significantly from the current specified by the vendors; (2) DRAM power consumption strongly depends on the data value that is read or written; (3) there is significant structural variation, where the same banks and...
more | pdf | html
Figures
Tweets
ComputerPapers: What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study. https://t.co/3KrwIxpibO
ElectronNest: What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study; https://t.co/0pDiTEU8Jh
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 12
Total Words: 23114
Unqiue Words: 4554

0.0 Mikeys
#10. TRINITY: Coordinated Performance, Energy and Temperature Management in 3D Processor-Memory Stacks
Karthik Rao, William Song, Yorai Wardi, Sudhakar Yalamanchili
The consistent demand for better performance has lead to innovations at hardware and microarchitectural levels. 3D stacking of memory and logic dies delivers an order of magnitude improvement in available memory bandwidth. The price paid however is, tight thermal constraints. In this paper, we study the complex multiphysics interactions between performance, energy and temperature. Using a cache coherent multicore processor cycle level simulator coupled with power and thermal estimation tools, we investigate the interactions between (a) thermal behaviors (b) compute and memory microarchitecture and (c) application workloads. The key insights from this exploration reveal the need to manage performance, energy and temperature in a coordinated fashion. Furthermore, we identify the concept of "effective heat capacity" i.e. the heat generated beyond which no further gains in performance is observed with increases in voltage-frequency of the compute logic. Subsequently, a real-time, numerical optimization based, application agnostic...
more | pdf | html
Figures
Tweets
ComputerPapers: TRINITY: Coordinated Performance, Energy and Temperature Management in 3D Processor-Memory Stacks. https://t.co/0ZCpTU0Uj1
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 12627
Unqiue Words: 3274

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,995 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 72,995 papers.