We consider the problem of finding the minimizer of a function $f:
\mathbb{R}^d \rightarrow \mathbb{R}$ of the finite-sum form $\min f(w) =
1/n\sum_{i}^n f_i(w)$. This problem has been studied intensively in recent
years in the field of machine learning (ML). One promising approach for
large-scale data is to use a stochastic optimization algorithm to solve the
problem. SGDLibrary is a readable, flexible and extensible pure-MATLAB library
of a collection of stochastic optimization algorithms. The purpose of the
library is to provide researchers and implementers a comprehensive evaluation
environment for the use of these algorithms on various ML problems.

more |
pdf
| html
MATLAB library for stochastic optimization algorithms: Version 1.0.17

Stargazers: 49

Subscribers: 8

Subscribers: 8

Forks: 24

Open Issues: 4

Open Issues: 4

None.

Sample Sizes : None.

Authors: 1

Total Words: 8646

Unqiue Words: 1788

Development systems for deep learning (DL), such as Theano, Torch,
TensorFlow, or MXNet, are easy-to-use tools for creating complex neural network
models. Since gradient computations are automatically baked in, and execution
is mapped to high performance hardware, these models can be trained end-to-end
on large amounts of data. However, it is currently not easy to implement many
basic machine learning primitives in these systems (such as Gaussian processes,
least squares estimation, principal components analysis, Kalman smoothing),
mainly because they lack efficient support of linear algebra primitives as
differentiable operators. We detail how a number of matrix decompositions
(Cholesky, LQ, symmetric eigen) can be implemented as differentiable operators.
We have implemented these primitives in MXNet, running on CPU and GPU in single
and double precision. We sketch use cases of these new operators, learning
Gaussian process and Bayesian linear regression models, where we demonstrate
very substantial reductions in implementation...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 12940

Unqiue Words: 3466

In this paper we present the new Dune-Python module which provides Python
bindings for the Dune core, which is a C++ environment for solving partial
differential equations. The aim of this new module is to firstly provide the
general infrastructure for exporting realizations of statically polymorphic
interfaces based on just-in-time compilation and secondly to provide bindings
for the central interfaces of the dune core modules. In the first release we
focus on the grid interface. Our aim is to only introduce a thin layer when
passing objects into Python which can be removed when the object is passed back
into a C++ algorithm. Thus no efficiency is lost and little additional code
maintenance cost is incurred. To make the transition for Dune users to the
Python environment straightforward the Python classes provide a very similar
interface to their C++ counterparts. In addition, vectorized versions of many
interfaces allow for more efficient code on the Python side. The infrastructure
for exporting these interfaces and the...

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 15195

Unqiue Words: 3285

SiMRX is a MRX simulation toolbox written in MATLAB for simulation of
realistic 2D and 3D Magnetorelaxometry (MRX) setups, including coils, sensors
and activation patterns. MRX is a new modality that uses magnetic nanoparticles
(MNP) as contrast agent and shows promising results in medical applications,
e.g. cancer treatment. Its basic principles were outlined in [Baumgarten et
al., 2008], further elaborated in [Liebl et al., 2014], transferred into a
rigorous mathematical model and analyzed in [F\"ocke et al., 2018].
SiMRX is available at https://gitlab.com/foecke/SiMRX/.

more |
pdf
| html
MathPaper:
SiMRX - A Simulation toolbox for MRX. https://t.co/b10mb5MV3z

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 2726

Unqiue Words: 1001

Applications that exploit the architectural details of high performance
computing (HPC) systems have become increasingly invaluable in academia and
industry over the past two decades. The most important hardware development of
the last decade in HPC has been the General Purpose Graphics Processing Unit
(GPGPU), a class of massively parallel devices that now contributes the
majority of computational power in the top 500 supercomputers. As these systems
grow small costs such as latency---the fixed cost of memory
accesses---accumulate over the numerous iterations in a large simulation and
become a significant barrier to performance. The swept time-space decomposition
rule is a communication-avoiding technique for time-stepping stencil update
formulas that attempts to sidestep a portion of the latency costs. This work
extends the swept rule by targeting heterogeneous, CPU/GPU architectures
representative of current and future HPC systems. We compare our approach to a
naive decomposition scheme with two test equations using an MPI+CUDA...

more |
pdf
| html
None.

PhysicsPaper:
Applying the swept rule for explicit partial differential equation solutions on heterogeneous computing systems. https://t.co/dUV0BL88Aj

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 6881

Unqiue Words: 2242

This is a tutorial in applied and computational topology and topological data
analysis. It is illustrated with numerous computational examples that utilize
Gudhi library. It is under constant development, so please do not consider this
version as final.

more |
pdf
| html
MathPaper:
Computational and applied topology, tutorial. https://t.co/eki8OyMawI

None.

None.

Sample Sizes : None.

Authors: 1

Total Words: 24480

Unqiue Words: 3853

The efficacy of deep learning has resulted in it becoming one of the most
important applications run in data centers today. The NVIDIA Tesla V100 GPU
introduced a specialized functional unit called the Tensor Core to meet growing
demand for higher performance on this workload. To exploit the full capability
of current NVIDIA GPUs machine learning researchers have started to use Tensor
Cores. For example, 5 out of 6, 2018 Gordon Bell Award Finalists used Tensor
Cores in their work. However, currently no open-source GPU microarchitectural
simulators model Tensor Cores. In this paper, we comprehensively investigate
NVIDIA's Tensor Core implementation found in Volta and Turing architectures and
propose an architectural model for it. Our Tensor Core timing model,
implemented in GPGPU-Sim, achieves 99.6% IPC correlation versus a physical V100
GPU. Building upon this we also enable GPGPU-Sim to run NVIDIA's CUTLASS, an
open-source CUDA C++ templates library providing customizable GEMM templates
including the support for Tensor Cores.

more |
pdf
| html
None.

ogawa_tter:
=>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehensively investigate NVIDIA's Tensor Core
Tensor Core timing model, achieves 99.6% IPC correlation vs. a physical V100
Ref https://t.co/biXLICYe1y https://t.co/UVNWkHYybh

nmfeeds:
[O] https://t.co/V7IKJYppjA Modeling Deep Learning Accelerator Enabled GPUs. The efficacy of deep learning has resulted in...

ComputerPapers:
Modeling Deep Learning Accelerator Enabled GPUs. https://t.co/j14mUzsmtX

MUKULBHALLA7:
https://t.co/5ItBB2ScKV

ProfMatsuoka:
RT @ogawa_tter: =>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehen…

xiangze750:
RT @ogawa_tter: =>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehen…

ilyesgouta:
RT @ogawa_tter: =>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehen…

Yugi_NHC:
RT @ogawa_tter: =>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehen…

horiken2004:
RT @ogawa_tter: =>
"Modeling Deep Learning Accelerator Enabled GPUs", .., Tor Aamodt, arXiv, Nov 19, 2018 https://t.co/HbBnDjzaSV
Comprehen…

None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 8291

Unqiue Words: 2181

Clawpack is a library for solving nonlinear hyperbolic partial differential
equations using high-resolution finite volume methods based on Riemann solvers
and limiters. It supports Adaptive Mesh Refinement (AMR), which is essential in
solving multi-scale problems. Recently, we added capabilities to accelerate the
code by using the Graphics Process Unit (GPU). Routines that manage CPU and GPU
AMR data and facilitate the execution of GPU kernels are added. Customized and
CPU thread-safe memory managers are designed to manage GPU and CPU memory
pools, which is essential in eliminating the overhead of memory allocation and
de-allocation. A global reduction is conducted every time step for dynamically
adjusting the time step based on Courant number restrictions. Some small GPU
kernels are merged into bigger kernels, which greatly reduces kernel launching
overhead. A speed-up between $2$ and $3$ for the total running time is observed
in an acoustics benchmark problem.

more |
pdf
| html
None.

None.

Sample Sizes : None.

Authors: 3

Total Words: 11135

Unqiue Words: 2341

In (Franceschi et al., 2018) we proposed a unified mathematical framework,
grounded on bilevel programming, that encompasses gradient-based hyperparameter
optimization and meta-learning. We formulated an approximate version of the
problem where the inner objective is solved iteratively, and gave sufficient
conditions ensuring convergence to the exact problem. In this work we show how
to optimize learning rates, automatically weight the loss of single examples
and learn hyper-representations with Far-HO, a software package based on the
popular deep learning framework TensorFlow that allows to seamlessly tackle
both HO and ML problems.

more |
pdf
| html
Gradient based hyperparameter optimization & meta-learning package for TensorFlow

Stargazers: 31

Subscribers: 4

Subscribers: 4

Forks: 6

Open Issues: 1

Open Issues: 1

None.

Sample Sizes : None.

Authors: 5

Total Words: 3015

Unqiue Words: 1232

Many classical finite elements such as the Argyris and Bell elements have
long been absent from high-level PDE software. Building on recent theoretical
work, we describe how to implement very general finite element transformations
in FInAT and hence into the Firedrake finite element system. Numerical results
evaluate the new elements, comparing them to existing methods for classical
problems. For a second order model problem, we find that new elements give
smooth solutions at a mild increase in cost over standard Lagrange elements.
For fourth-order problems, however, the newly-enabled methods significantly
outperform interior penalty formulations. We also give some advanced use cases,
solving the nonlinear Cahn-Hilliard equation some biharmonic eigenvalue
problems (including Chladni plates) using $C^1$ discretizations.

more |
pdf
| html
_wence:
New preprint with Rob Kirby on adding C¹ elements to FInAT https://t.co/y9YQMAruDg. As a bonus, I got to make some pretty pictures. https://t.co/PaS6sWGRHW

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 10597

Unqiue Words: 3119

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 72,893 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible