Top 10 Arxiv Papers Today in Hardware Architecture


0.0 Mikeys
#1. Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach
Yuzhe Ma, Subhendu Roy, Jin Miao, Jiamin Chen, Bei Yu
In spite of maturity to the modern electronic design automation (EDA) tools, optimized designs at architectural stage may become sub-optimal after going through physical design flow. Adder design has been such a long studied fundamental problem in VLSI industry yet designers cannot achieve optimal solutions by running EDA tools on the set of available prefix adder architectures. In this paper, we enhance a state-of-the-art prefix adder synthesis algorithm to obtain a much wider solution space in architectural domain. On top of that, a machine learning-based design space exploration methodology is applied to predict the Pareto frontier of the adders in physical domain, which is infeasible by exhaustively running EDA tools for innumerable architectural solutions. Considering the high cost of obtaining the true values for learning, an active learning algorithm is utilized to select the representative data during learning process, which uses less labeled data while achieving better quality of Pareto frontier. Experimental results...
more | pdf | html
Figures
Tweets
nmfeeds: [O] https://t.co/l1WV3SdbVF Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach. In ...
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 11792
Unqiue Words: 2890

0.0 Mikeys
#2. Hardware realization of residue number system algorithms by Boolean functions minimization
Danila Gorodecky, Tiziano Villa
Residue number systems (RNS) represent numbers by their remainders modulo a set of relatively prime numbers. This paper pro- poses an efficient hardware implementation of modular multiplication and of the modulo function (X(mod P)), based on Boolean minimiza- tion. We report experiments showing a performance advantage up to 30 times for our approach vs. the results obtained by state-of-art industrial tools.
more | pdf | html
Figures
Tweets
ComputerPapers: Hardware realization of residue number system algorithms by Boolean functions minimization. https://t.co/i913RSjEOT
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 3476
Unqiue Words: 1218

0.0 Mikeys
#3. D-RaNGe: Violating DRAM Timing Constraints for High-Throughput True Random Number Generation using Commodity DRAM Devices
Jeremie S. Kim, Minesh Patel, Hasan Hassan, Lois Orosa, Onur Mutlu
DRAM provides a promising substrate for generating random numbers due to three major reasons: 1) DRAM is composed of a large number of cells that are susceptible to many different failure modes that can be exploited for random number generation, 2) the high-bandwidth DRAM interface provides support for high-throughput random number generation, and 3) DRAM is prevalent in many commodity computing systems today, ranging from embedded devices to high-performance computing platforms. In this work, we propose a new DRAM-based true random number generator (TRNG) that exploits error patterns resulting from the deliberate violation of DRAM read access timing specifications. Specifically, by decreasing the DRAM row activation latency (tRCD) below manufacturer-recommended specifications, we induce read errors, or activation failures, that exhibit true random behavior. We then aggregate the resulting data from multiple cells to obtain a TRNG capable of continuous high-throughput operation. To demonstrate that our TRNG design is viable on...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 14974
Unqiue Words: 3661

0.0 Mikeys
#4. Architectural Techniques for Improving NAND Flash Memory Reliability
Yixin Luo
Raw bit errors are common in NAND flash memory and will increase in the future. These errors reduce flash reliability and limit the lifetime of a flash memory device. We aim to improve flash reliability with a multitude of low-cost architectural techniques. We show that NAND flash memory reliability can be improved at low cost and with low performance overhead by deploying various architectural techniques that are aware of higher-level application behavior and underlying flash device characteristics. We analyze flash error characteristics and workload behavior through experimental characterization, and design new flash controller algorithms that use the insights gained from our analysis to improve flash reliability at a low cost. We investigate four directions through this approach. (1) We propose a new technique called WARM that improves flash reliability by 12.9 times by managing flash retention differently for write-hot data and write-cold data. (2) We propose a new framework that learns an online flash channel model for each...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 124832
Unqiue Words: 13105

0.0 Mikeys
#5. Deriving AOC C-Models from D&V Languages for Single- or Multi-Threaded Execution Using C or C++
Tobias Strauch
The C language is getting more and more popular as a design and verification language (DVL). SystemC, ParC [1] and Cx [2] are based on C. C-models of the design and verification environment can also be generated from new DVLs (e.g. Chisel [3]) or classical DVLs such as VHDL or Verilog. The execution of these models is usually license free and presumably faster than their alternative counterparts (simulators). This paper proposes activity-dependent, ordered, cycle-accurate (AOC) C-models to speed up simulation time. It compares the results with alternative concepts. The paper also examines the execution of the AOC C-model on a multithreaded processor environment.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 4710
Unqiue Words: 1446

0.0 Mikeys
#6. Rendering Elimination: Early Discard of Redundant Tiles in the Graphics Pipeline
Martí Anglada, Enrique de Lucas, Joan-Manuel Parcerisa, Juan L. Aragón, Pedro Marcuello, Antonio González
GPUs are one of the most energy-consuming components for real-time rendering applications, since a large number of fragment shading computations and memory accesses are involved. Main memory bandwidth is especially taxing battery-operated devices such as smartphones. Tile-Based Rendering GPUs divide the screen space into multiple tiles that are independently rendered in on-chip buffers, thus reducing memory bandwidth and energy consumption. We have observed that, in many animated graphics workloads, a large number of screen tiles have the same color across adjacent frames. In this paper, we propose Rendering Elimination (RE), a novel micro-architectural technique that accurately determines if a tile will be identical to the same tile in the preceding frame before rasterization by means of comparing signatures. Since RE identifies redundant tiles early in the graphics pipeline, it completely avoids the computation and memory accesses of the most power consuming stages of the pipeline, which substantially reduces the execution time...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 10110
Unqiue Words: 2572

0.0 Mikeys
#7. The BaseJump Manycore Accelerator Network
Shaolin Xie, Michael Bedford Taylor
The BaseJump Manycore Accelerator-Network is an open source mesh-based On-Chip-Network which is designed leveraging the Bespoke Silicon Group's 20+ years of experience in designing manycore architectures. It has been used in the 16nm 511-core RISC-V compatible Celerity chip Davidson et al. (2018), forming the basis of both a 1 GHz 496-core RISC-V manycore and a 10-core always-on low voltage complex. It was also used in the 180nm BSG Ten chip, which featured ten cores and a mesh that extends over off-chip links to an FPGA. To facilitate use by the open source community of the BaseJump Manycore network, we explain the ideas, protocols, interfaces and potential uses of the mesh network. We also show an example with source code that demonstrates how to integrate user designs into the mesh network.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 5407
Unqiue Words: 1723

0.0 Mikeys
#8. Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces
Yongming Shen, Tianchu Ji, Michael Ferdman, Peter Milder
To cope with the increasing demand and computational intensity of deep neural networks (DNNs), industry and academia have turned to accelerator technologies. In particular, FPGAs have been shown to provide a good balance between performance and energy efficiency for accelerating DNNs. While significant research has focused on how to build efficient layer processors, the computational building blocks of DNN accelerators, relatively little attention has been paid to the on-chip interconnects that sit between the layer processors and the FPGA's DRAM controller. We observe a disparity between DNN accelerator interfaces, which tend to comprise many narrow ports, and FPGA DRAM controller interfaces, which tend to be wide buses. This mismatch causes traditional interconnects to consume significant FPGA resources. To address this problem, we designed Medusa: an optimized FPGA memory interconnect which transposes data in the interconnect fabric, tailoring the interconnect to the needs of DNN layer processors. Compared to a traditional...
more | pdf | html
Figures
None.
Tweets
ogawa_tter: => "Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces", arXiv, Jul 11, 2018 (FPL 2018, Aug 27, 2018) https://t.co/qf9lk3Rs0A Yongming Shen https://t.co/XwcRbdDHUi Peter Milder, Stony Brook University https://t.co/wrJ9atrkm2 https://t.co/T7XqgDiNFi
nmfeeds: [O] https://t.co/I6DO0P3GXq Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interf...
nmfeeds: [NE] https://t.co/I6DO0P3GXq Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Inter...
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 6412
Unqiue Words: 1865

0.0 Mikeys
#9. Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation
Yixin Luo, Saugata Ghose, Yu Cai, Erich F. Haratsch, Onur Mutlu
Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash memory uses a new flash cell design, and vertically stacks dozens of silicon layers in a single chip. This allows 3D NAND flash memory to increase storage density using a much less aggressive manufacturing process technology than planar NAND flash memory. The circuit-level and structural changes in 3D NAND flash memory significantly alter how different error sources affect the reliability of the memory. In this paper, through experimental characterization of real, state-of-the-art 3D NAND flash memory chips, we find that 3D NAND flash memory exhibits three new error sources that were not previously observed in planar NAND flash memory: (1) layer-to-layer process variation, where the average error rate of each 3D-stacked layer in a chip is significantly different; (2) early retention loss, a new phenomenon where the number of errors due to charge leakage increases quickly within several hours after programming; and (3) retention interference, a new...
more | pdf | html
Figures
None.
Tweets
ComputerPapers: Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation. https://t.co/WQ9Fl7fZcR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 27610
Unqiue Words: 4203

0.0 Mikeys
#10. What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study
Saugata Ghose, Abdullah Giray Yağlıçkı, Raghav Gupta, Donghyuk Lee, Kais Kudrolli, William X. Liu, Hasan Hassan, Kevin K. Chang, Niladrish Chatterjee, Aditya Agrawal, Mike O'Connor, Onur Mutlu
Main memory (DRAM) consumes as much as half of the total system power in a computer today, resulting in a growing need to develop new DRAM architectures and systems that consume less power. Researchers have long relied on DRAM power models that are based off of standardized current measurements provided by vendors, called IDD values. Unfortunately, we find that these models are highly inaccurate, and do not reflect the actual power consumed by real DRAM devices. We perform the first comprehensive experimental characterization of the power consumed by modern real-world DRAM modules. Our extensive characterization of 50 DDR3L DRAM modules from three major vendors yields four key new observations about DRAM power consumption: (1) across all IDD values that we measure, the current consumed by real DRAM modules varies significantly from the current specified by the vendors; (2) DRAM power consumption strongly depends on the data value that is read or written; (3) there is significant structural variation, where the same banks and...
more | pdf | html
Figures
Tweets
ComputerPapers: What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study. https://t.co/3KrwIxpibO
ElectronNest: What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study; https://t.co/0pDiTEU8Jh
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 12
Total Words: 23114
Unqiue Words: 4554

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 57,756 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 57,756 papers.