We propose a novel Shapley value approach to help address neural networks'
interpretability and "vanishing gradient" problems. Our method is based on an
accurate analytical approximation to the Shapley value of a neuron with ReLU
activation. This analytical approximation admits a linear propagation of
relevance across neural network layers, resulting in a simple, fast and
sensible interpretation of neural networks' decision making process.
We then derived a globally continuous and non-vanishing Shapley gradient,
which can replace the conventional gradient in training neural network layers
with ReLU activation, and leading to better training performance. We further
derived a Shapley Activation (SA) function, which is a close approximation to
ReLU but features the Shapley gradient. The SA is easy to implement in existing
machine learning frameworks. Numerical tests show that SA consistently
outperforms ReLU in training convergence, accuracy and stability.

more |
pdf
| html
None.

BrundageBot:
Shapley Interpretation and Activation in Neural Networks. Yadong Li and Xin Cui https://t.co/tgpgdQnjBs

arxivml:
"Shapley Interpretation and Activation in Neural Networks",
Yadong Li, Xin Cui
https://t.co/nHzKMilJad

arxiv_cs_LG:
Shapley Interpretation and Activation in Neural Networks. Yadong Li and Xin Cui https://t.co/hTQRBdEJX6

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

We study the problem of estimating high dimensional models with underlying
sparse structures while preserving the privacy of each training example. We
develop a differentially private high-dimensional sparse learning framework
using the idea of knowledge transfer. More specifically, we propose to distill
the knowledge from a "teacher" estimator trained on a private dataset, by
creating a new dataset from auxiliary features, and then train a differentially
private "student" estimator using this new dataset. In addition, we establish
the linear convergence rate as well as the utility guarantee for our proposed
method. For sparse linear regression and sparse logistic regression, our method
achieves improved utility guarantees compared with the best known results
(Kifer et al., 2012; Wang and Gu, 2019). We further demonstrate the superiority
of our framework through both synthetic and real-world data experiments.

more |
pdf
| html
None.

BrundageBot:
A Knowledge Transfer Framework for Differentially Private Sparse Learning. Lingxiao Wang and Quanquan Gu https://t.co/gdrzK4SuzH

arxivml:
"A Knowledge Transfer Framework for Differentially Private Sparse Learning",
Lingxiao Wang, Quanquan Gu
https://t.co/ZtFdd2Q5J8

arxiv_cs_LG:
A Knowledge Transfer Framework for Differentially Private Sparse Learning. Lingxiao Wang and Quanquan Gu https://t.co/NWDgGd165m

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

As part of a quality control process in manufacturing it is often necessary
to test whether all parts of a product satisfy a required property, with as few
inspections as possible. When multiple inspection apparatuses with different
costs and precision exist, it is desirable that testing can be carried out
cost-effectively by properly controlling the trade-off between the costs and
the precision. In this paper, we formulate this as a level set estimation (LSE)
problem under cost-dependent input uncertainty - LSE being a type of active
learning for estimating the level set, i.e., the subset of the input space in
which an unknown function value is greater or smaller than a pre-determined
threshold. Then, we propose a new algorithm for LSE under cost-dependent input
uncertainty with theoretical convergence guarantee. We demonstrate the
effectiveness of the proposed algorithm by applying it to synthetic and real
datasets.

more |
pdf
| html
None.

arxiv_cs_LG:
Active learning for level set estimation under cost-dependent input uncertainty. Yu Inatsu, Masayuki Karasuyama, Keiichi Inoue, and Ichiro Takeuchi https://t.co/I0kCWBMSoO

None.

None.

Sample Sizes : None.

Authors: 4

Total Words: 0

Unqiue Words: 0

Credit scoring models support loan approval decisions in the financial
services industry. Lenders train these models on data from previously granted
credit applications, where the borrowers' repayment behavior has been observed.
This approach creates sample bias. The scoring model (i.e., classifier) is
trained on accepted cases only. Applying the resulting model to screen credit
applications from the population of all borrowers degrades model performance.
Reject inference comprises techniques to overcome sampling bias through
assigning labels to rejected cases. The paper makes two contributions. First,
we propose a self-learning framework for reject inference. The framework is
geared toward real-world credit scoring requirements through considering
distinct training regimes for iterative labeling and model training. Second, we
introduce a new measure to assess the effectiveness of reject inference
strategies. Our measure leverages domain knowledge to avoid artificial labeling
of rejected cases during strategy evaluation. We...

more |
pdf
| html
None.

arxiv_cs_LG:
Shallow Self-Learning for Reject Inference in Credit Scoring. Nikita Kozodoi, Panagiotis Katsas, Stefan Lessmann, Luis Moreira-Matias, and Konstantinos Papakonstantinou https://t.co/4cYjvadCpt

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

Entity resolution (ER) (record linkage or de-duplication) is the process of
merging together noisy databases, often in the absence of a unique identifier.
A major advancement in ER methodology has been the application of Bayesian
generative models. Such models provide a natural framework for clustering
records to unobserved (latent) entities, while providing exact uncertainty
quantification and tight performance bounds. Despite these advancements,
existing models do not scale to realistically-sized databases (larger than 1000
records) and they do not incorporate probabilistic blocking. In this paper, we
propose "distributed Bayesian linkage" or d-blink -- the first scalable and
distributed end-to-end Bayesian model for ER, which propagates uncertainty in
blocking, matching and merging. We make several novel contributions, including:
(i) incorporating probabilistic blocking directly into the model through
auxiliary partitions; (ii) support for missing values; (iii) a
partially-collapsed Gibbs sampler; and (iv) a novel perturbation...

more |
pdf
| html
None.

arxiv_cs_LG:
d-blink: Distributed End-to-End Bayesian Entity Resolution. Neil G. Marchant, Rebecca C. Steorts, Andee Kaplan, Benjamin I. P. Rubinstein, and Daniel N. Elazar https://t.co/JKFMGcFG7z

None.

None.

Sample Sizes : None.

Authors: 5

Total Words: 0

Unqiue Words: 0

The Fisher information matrix (FIM) is a key quantity in statistics as it is
required for example for evaluating asymptotic precisions of parameter
estimates, for computing test statistics or asymptotic distributions in
statistical testing, for evaluating post model selection inference results or
optimality criteria in experimental designs. However its exact computation is
often not trivial. In particular in many latent variable models, it is
intricated due to the presence of unobserved variables. Therefore the observed
FIM is usually considered in this context to estimate the FIM. Several methods
have been proposed to approximate the observed FIM when it can not be evaluated
analytically. Among the most frequently used approaches are Monte-Carlo methods
or iterative algorithms derived from the missing information principle. All
these methods require to compute second derivatives of the complete data
log-likelihood which leads to some disadvantages from a computational point of
view. In this paper, we present a new approach to...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

By mixing the posterior distribution with a surrogate distribution, of which
the normalizing constant is tractable, we describe a new method to estimate the
normalizing constant using the Wang-Landau algorithm. We then introduce an
accelerated version of the proposed method using the momentum technique. In
addition, several extensions are discussed, including (1) a parallel variant,
which inserts a sequence of intermediate distributions between the posterior
distribution and the surrogate distribution, to further improve the efficiency
of the proposed method; (2) the use of the surrogate distribution to help
detect potential multimodality of the posterior distribution, upon which a
better sampler can be designed utilizing mode jumping algorithms; (3) a new
jumping mechanism for general reversible jump Markov chain Monte Carlo
algorithms that combines the Multiple-try Metropolis and the directional
sampling algorithm, which can be used to estimate the normalizing constant when
a surrogate distribution is difficult to come by. We...

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

A generalization of the definition of records to functional data is proposed.
The definition is based on ranking curves using a notion of functional depth.
This approach allows us to study the curves of the number of records over time.
We focus on functional time series and apply ideas from univariate time series
to demonstrate the asymptotic distribution describing the number of records. A
unit root test is proposed as an application of functional record theory.
Through a Monte Carlo study, different scenarios of functional processes are
simulated to evaluate the performance of the unit root test. The generalized
record definition is applied on two different datasets: Annual mortality rates
in France and daily curves of wind speed at Yanbu, Saudi Arabia. The record
curves are identified and the underlying functional process is studied based on
the number of record curves observed.

more |
pdf
| html
None.

None.

None.

Sample Sizes : [100, 1000, 10000]

Authors: 2

Total Words: 13108

Unqiue Words: 2709

We consider the problem of detecting abrupt changes in an otherwise smoothly
evolving trend whilst the covariance and higher-order structures of the system
can experience both smooth and abrupt changes over time. The number of abrupt
change points is allowed to diverge to infinity with the jump sizes possibly
shrinking to zero. The method is based on a multiscale application of an
optimal jump-pass filter to the time series, where the scales are dense between
admissible lower and upper bounds. The MACE method is shown to be able to
detect all abrupt change points within a nearly optimal range with a prescribed
probability asymptotically. For a time series of length $n$, the computational
complexity of MACE is $O(n)$ for each scale and $O(n\log^{1+\epsilon} n)$
overall, where $\epsilon$ is an arbitrarily small positive constant.
Simulations and data analysis show that, under complex temporal dynamics, MACE
performs favourably compared with some of the state-of-the-art multiscale
change point detection methods.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

Modern statistical learning techniques have often emphasized prediction
performance over interpretability, giving rise to "black box" models that may
be difficult to understand, and to generalize to other settings. We
conceptually divide a prediction model into interpretable and non-interpretable
portions, as a means to produce models that are highly interpretable with
little loss in performance. Implementation of the model is achieved by
considering separability of the interpretable and non-interpretable portions,
along with a doubly penalized procedure for model fitting. We specify
conditions under which convergence of model estimation can be achieved via
cyclic coordinate ascent, and the consistency of model estimation holds. We
apply the methods to datasets for microbiome host trait prediction and a
diabetes trait, and discuss practical tradeoff diagnostics to select models
with high interpretability.

more |
pdf
| html
None.

None.

None.

Sample Sizes : None.

Authors: 2

Total Words: 0

Unqiue Words: 0

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

*Tracking 189,566 papers.*

Sort results based on if they are interesting or reproducible.

Interesting

Reproducible