Top 10 Arxiv Papers Today
in Hardware Architecture
In spite of maturity to the modern electronic design automation (EDA) tools,
optimized designs at architectural stage may become sub-optimal after going
through physical design flow. Adder design has been such a long studied
fundamental problem in VLSI industry yet designers cannot achieve optimal
solutions by running EDA tools on the set of available prefix adder
architectures. In this paper, we enhance a state-of-the-art prefix adder
synthesis algorithm to obtain a much wider solution space in architectural
domain. On top of that, a machine learning-based design space exploration
methodology is applied to predict the Pareto frontier of the adders in physical
domain, which is infeasible by exhaustively running EDA tools for innumerable
architectural solutions. Considering the high cost of obtaining the true values
for learning, an active learning algorithm is utilized to select the
representative data during learning process, which uses less labeled data while
achieving better quality of Pareto frontier. Experimental results...
[O] https://t.co/l1WV3SdbVF Cross-layer Optimization for High Speed Adders: A Pareto Driven Machine Learning Approach. In ...
Total Words: 11792
Unqiue Words: 2890
Residue number systems (RNS) represent numbers by their remainders modulo a
set of relatively prime numbers. This paper pro- poses an efficient hardware
implementation of modular multiplication and of the modulo function (X(mod P)),
based on Boolean minimiza- tion. We report experiments showing a performance
advantage up to 30 times for our approach vs. the results obtained by
state-of-art industrial tools.
Hardware realization of residue number system algorithms by Boolean functions minimization. https://t.co/i913RSjEOT
Total Words: 3476
Unqiue Words: 1218
Jeremie S. Kim,
DRAM provides a promising substrate for generating random numbers due to
three major reasons: 1) DRAM is composed of a large number of cells that are
susceptible to many different failure modes that can be exploited for random
number generation, 2) the high-bandwidth DRAM interface provides support for
high-throughput random number generation, and 3) DRAM is prevalent in many
commodity computing systems today, ranging from embedded devices to
high-performance computing platforms.
In this work, we propose a new DRAM-based true random number generator (TRNG)
that exploits error patterns resulting from the deliberate violation of DRAM
read access timing specifications. Specifically, by decreasing the DRAM row
activation latency (tRCD) below manufacturer-recommended specifications, we
induce read errors, or activation failures, that exhibit true random behavior.
We then aggregate the resulting data from multiple cells to obtain a TRNG
capable of continuous high-throughput operation.
To demonstrate that our TRNG design is viable on...
Total Words: 14974
Unqiue Words: 3661
Raw bit errors are common in NAND flash memory and will increase in the
future. These errors reduce flash reliability and limit the lifetime of a flash
memory device. We aim to improve flash reliability with a multitude of low-cost
architectural techniques. We show that NAND flash memory reliability can be
improved at low cost and with low performance overhead by deploying various
architectural techniques that are aware of higher-level application behavior
and underlying flash device characteristics.
We analyze flash error characteristics and workload behavior through
experimental characterization, and design new flash controller algorithms that
use the insights gained from our analysis to improve flash reliability at a low
cost. We investigate four directions through this approach. (1) We propose a
new technique called WARM that improves flash reliability by 12.9 times by
managing flash retention differently for write-hot data and write-cold data.
(2) We propose a new framework that learns an online flash channel model for
Total Words: 124832
Unqiue Words: 13105
The C language is getting more and more popular as a design and verification
language (DVL). SystemC, ParC  and Cx  are based on C. C-models of the
design and verification environment can also be generated from new DVLs (e.g.
Chisel ) or classical DVLs such as VHDL or Verilog. The execution of these
models is usually license free and presumably faster than their alternative
counterparts (simulators). This paper proposes activity-dependent, ordered,
cycle-accurate (AOC) C-models to speed up simulation time. It compares the
results with alternative concepts. The paper also examines the execution of the
AOC C-model on a multithreaded processor environment.
Total Words: 4710
Unqiue Words: 1446
Enrique de Lucas,
Juan L. Aragón,
GPUs are one of the most energy-consuming components for real-time rendering
applications, since a large number of fragment shading computations and memory
accesses are involved. Main memory bandwidth is especially taxing
battery-operated devices such as smartphones. Tile-Based Rendering GPUs divide
the screen space into multiple tiles that are independently rendered in on-chip
buffers, thus reducing memory bandwidth and energy consumption. We have
observed that, in many animated graphics workloads, a large number of screen
tiles have the same color across adjacent frames. In this paper, we propose
Rendering Elimination (RE), a novel micro-architectural technique that
accurately determines if a tile will be identical to the same tile in the
preceding frame before rasterization by means of comparing signatures. Since RE
identifies redundant tiles early in the graphics pipeline, it completely avoids
the computation and memory accesses of the most power consuming stages of the
pipeline, which substantially reduces the execution time...
Total Words: 10110
Unqiue Words: 2572
Michael Bedford Taylor
The BaseJump Manycore Accelerator-Network is an open source mesh-based
On-Chip-Network which is designed leveraging the Bespoke Silicon Group's 20+
years of experience in designing manycore architectures. It has been used in
the 16nm 511-core RISC-V compatible Celerity chip Davidson et al. (2018),
forming the basis of both a 1 GHz 496-core RISC-V manycore and a 10-core
always-on low voltage complex. It was also used in the 180nm BSG Ten chip,
which featured ten cores and a mesh that extends over off-chip links to an
FPGA. To facilitate use by the open source community of the BaseJump Manycore
network, we explain the ideas, protocols, interfaces and potential uses of the
mesh network. We also show an example with source code that demonstrates how to
integrate user designs into the mesh network.
Total Words: 5407
Unqiue Words: 1723
To cope with the increasing demand and computational intensity of deep neural
networks (DNNs), industry and academia have turned to accelerator technologies.
In particular, FPGAs have been shown to provide a good balance between
performance and energy efficiency for accelerating DNNs. While significant
research has focused on how to build efficient layer processors, the
computational building blocks of DNN accelerators, relatively little attention
has been paid to the on-chip interconnects that sit between the layer
processors and the FPGA's DRAM controller.
We observe a disparity between DNN accelerator interfaces, which tend to
comprise many narrow ports, and FPGA DRAM controller interfaces, which tend to
be wide buses. This mismatch causes traditional interconnects to consume
significant FPGA resources. To address this problem, we designed Medusa: an
optimized FPGA memory interconnect which transposes data in the interconnect
fabric, tailoring the interconnect to the needs of DNN layer processors.
Compared to a traditional...
"Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interfaces", arXiv, Jul 11, 2018 (FPL 2018, Aug 27, 2018) https://t.co/qf9lk3Rs0A
Yongming Shen https://t.co/XwcRbdDHUi
Peter Milder, Stony Brook University https://t.co/wrJ9atrkm2 https://t.co/T7XqgDiNFi
[O] https://t.co/I6DO0P3GXq Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Interf...
[NE] https://t.co/I6DO0P3GXq Medusa: A Scalable Interconnect for Many-Port DNN Accelerators and Wide DRAM Controller Inter...
Total Words: 6412
Unqiue Words: 1865
Erich F. Haratsch,
Compared to planar (i.e., two-dimensional) NAND flash memory, 3D NAND flash
memory uses a new flash cell design, and vertically stacks dozens of silicon
layers in a single chip. This allows 3D NAND flash memory to increase storage
density using a much less aggressive manufacturing process technology than
planar NAND flash memory. The circuit-level and structural changes in 3D NAND
flash memory significantly alter how different error sources affect the
reliability of the memory.
In this paper, through experimental characterization of real,
state-of-the-art 3D NAND flash memory chips, we find that 3D NAND flash memory
exhibits three new error sources that were not previously observed in planar
NAND flash memory: (1) layer-to-layer process variation, where the average
error rate of each 3D-stacked layer in a chip is significantly different; (2)
early retention loss, a new phenomenon where the number of errors due to charge
leakage increases quickly within several hours after programming; and (3)
retention interference, a new...
Improving 3D NAND Flash Memory Lifetime by Tolerating Early Retention Loss and Process Variation. https://t.co/WQ9Fl7fZcR
Total Words: 27610
Unqiue Words: 4203
Abdullah Giray Yağlıçkı,
William X. Liu,
Kevin K. Chang,
Main memory (DRAM) consumes as much as half of the total system power in a
computer today, resulting in a growing need to develop new DRAM architectures
and systems that consume less power. Researchers have long relied on DRAM power
models that are based off of standardized current measurements provided by
vendors, called IDD values. Unfortunately, we find that these models are highly
inaccurate, and do not reflect the actual power consumed by real DRAM devices.
We perform the first comprehensive experimental characterization of the power
consumed by modern real-world DRAM modules. Our extensive characterization of
50 DDR3L DRAM modules from three major vendors yields four key new observations
about DRAM power consumption: (1) across all IDD values that we measure, the
current consumed by real DRAM modules varies significantly from the current
specified by the vendors; (2) DRAM power consumption strongly depends on the
data value that is read or written; (3) there is significant structural
variation, where the same banks and...
What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study. https://t.co/3KrwIxpibO
What Your DRAM Power Models Are Not Telling You:
Lessons from a Detailed Experimental Study;
Total Words: 23114
Unqiue Words: 4554