This work is motivated by multimodality breast cancer imaging data, which is
quite challenging in that the signals of discrete tumor-associated
microvesicles (TMVs) are randomly distributed with heterogeneous patterns. This
imposes a significant challenge for conventional imaging regression and
dimension reduction models assuming a homogeneous feature structure. We develop
an innovative multilayer tensor learning method to incorporate heterogeneity to
a higher-order tensor decomposition and predict disease status effectively
through utilizing subject-wise imaging features and multimodality information.
Specifically, we construct a multilayer decomposition which leverages an
individualized imaging layer in addition to a modality-specific tensor
structure. One major advantage of our approach is that we are able to
efficiently capture the heterogeneous spatial features of signals that are not
characterized by a population structure as well as integrating multimodality
information simultaneously. To achieve scalable computing, we...

"Individualized Multilayer Tensor Learning with An Application in Imaging Analysis",
Xiwei Tang, Xuan Bi, Annie Qu
Sorting input objects is an important step in many machine learning
pipelines. However, the sorting operator is non-differentiable with respect to
its inputs, which prohibits end-to-end gradient-based optimization. In this
work, we propose NeuralSort, a general-purpose continuous relaxation of the
output of the sorting operator from permutation matrices to the set of unimodal
row-stochastic matrices, where every row sums to one and has a distinct arg
max. This relaxation permits straight-through optimization of any computational
graph involve a sorting operation. Further, we use this relaxation to enable
gradient-based stochastic optimization over the combinatorially large space of
permutations by deriving a reparameterized gradient estimator for the
Plackett-Luce family of distributions over permutations. We demonstrate the
usefulness of our framework on three tasks that require learning semantic
orderings of high-dimensional objects, including a fully differentiable,
parameterized extension of the k-nearest neighbors algorithm.

"Stochastic Optimization of Sorting Networks via Continuous Relaxations",
Aditya Grover, Eric Wang, Aaron Zweig, St…
Large-scale replication studies like the Reproducibility Project: Psychology
(RP:P) provide invaluable systematic data on scientific replicability, but most
analyses and interpretations of the data fail to agree on the definition of
"replicability" and disentangle the inexorable consequences of known selection
bias from competing explanations. We discuss three concrete definitions of
replicability based on (1) whether published findings about the signs of
effects are mostly correct, (2) how effective replication studies are in
reproducing whatever true effect size was present in the original experiment,
and (3) whether true effect sizes tend to diminish in replication. We apply
techniques from multiple testing and post-selection inference to develop new
methods that answer these questions while explicitly accounting for selection
bias. Re-analyzing the RP:P data, we estimate that 22 out of 68 (32%) original
directional claims were false (upper confidence bound 47%); by comparison, we
estimate that among claims significant at the...

The study of international relations by definition deals with
interdependencies among countries. One form of interdependence between
countries is the diffusion of country-level features, such as policies,
political regimes, or conflict. In these studies, the outcome variable tends to
be categorical, and the primary concern is the clustering of the outcome
variable among connected countries. Statistically, such clustering is studied
with spatial econometric models. This paper instead proposes the use of a
statistical network approach to model diffusion with a binary outcome variable.
Using statistical network instead of spatial econometric models allows for a
more natural specification of the diffusion process, assuming autocorrelation
in the outcomes rather than the corresponding latent variable, and it
simplifies the inclusion of temporal dynamics, higher level interdependencies
and interactions between network ties and country-level features. In our
simulations, the performance of the Stochastic Actor-Oriented Model...

A detailed understanding of wind turbine performance status classification
can improve operations and maintenance in the wind energy industry. Due to
different engineering properties of wind turbines, the standard supervised
learning models used for classification do not generalize across data sets
obtained from different wind sites. We propose two methods to deal with the
transferability of the trained models: first, data normalization in the form of
power curve alignment, and second, a robust method based on convolutional
neural networks and feature-space extension. We demonstrate the success of our
methods on real-world data sets with industrial applications.

"Transferability of Operational Status Classification Models Among Different Wind Turbine Typesq",
Z． Trstanova, A．…
We introduce hydra (hyperbolic distance recovery and approximation), a new
method for embedding network- or distance-based data into hyperbolic space. We
show mathematically that hydra satisfies a certain optimality guarantee: It
minimizes the 'hyperbolic strain' between original and embedded data points.
Moreover, it recovers points exactly, when they are located on a hyperbolic
submanifold of the feature space. Testing on real network data we show that
hydra typically outperforms existing hyperbolic embedding methods in terms of
embedding quality.

"Hydra: A method for strain-minimizing hyperbolic embedding",
Martin Keller-Ressel, Stephanie Nargang
We propose an exact slice sampler for Hierarchical Dirichlet process (HDP)
and its associated mixture models (Teh et al., 2006). Although there are
existing MCMC algorithms for sampling from the HDP, a slice sampler has been
missing from the literature. Slice sampling is well-known for its desirable
properties including its fast mixing and its natural potential for
parallelization. On the other hand, the hierarchical nature of HDPs poses
challenges to adopting a full-fledged slice sampler that automatically
truncates all the infinite measures involved without ad-hoc modifications. In
this work, we adopt the powerful idea of Bayesian variable augmentation to
address this challenge. By introducing new latent variables, we obtain a full
factorization of the joint distribution that is suitable for slice sampling.
Our algorithm has several appealing features such as (1) fast mixing; (2)
remaining exact while allowing natural truncation of the underlying
infinite-dimensional measures, as in (Kalli et al., 2011), resulting in updates
of...

"Exact slice sampler for Hierarchical Dirichlet Processes",
Arash A． Amini, Marina Paez, Lizhen Lin, Zahra S． Razaee
We augment linear Support Vector Machine (SVM) classifiers by adding three
important features: (i) we introduce a regularization constraint to induce a
sparse classifier; (ii) we devise a method that partitions the positive class
into clusters and selects a sparse SVM classifier for each cluster; and (iii)
we develop a method to optimize the values of controllable variables in order
to reduce the number of data points which are predicted to have an undesirable
outcome, which, in our setting, coincides with being in the positive class. The
latter feature leads to personalized prescriptions/recommendations. We apply
our methods to the problem of predicting and preventing hospital readmissions
within 30-days from discharge for patients that underwent a general surgical
procedure. To that end, we leverage a large dataset containing over 2.28
million patients who had surgeries in the period 2011--2014 in the U.S. The
dataset has been collected as part of the American College of Surgeons National
Surgical Quality Improvement Program (NSQIP).

"Prescriptive Cluster-Dependent Support Vector Machines with an Application to Reducing Hospital Readmissions",
Tai…
This note is concerned with an accurate and computationally efficient
variational bayesian treatment of mixed-effects modelling. We focus on group
studies, i.e. empirical studies that report multiple measurements acquired in
multiple subjects. When approached from a bayesian perspective, such
mixed-effects models typically rely upon a hierarchical generative model of the
data, whereby both within- and between-subject effects contribute to the
overall observed variance. The ensuing VB scheme can be used to assess
statistical significance at the group level and/or to capture inter-individual
differences. Alternatively, it can be seen as an adaptive regularization
procedure, which iteratively learns the corresponding within-subject priors
from estimates of the group distribution of effects of interest (cf. so-called
"empirical bayes" approaches). We outline the mathematical derivation of the
ensuing VB scheme, whose open-source implementation is available as part the
VBA toolbox.

"Variational Bayesian modelling of mixed-effects",
Jean Daunizeau
High dimensional data often contain multiple facets, and several clustering
patterns (views) can co-exist under different feature subspaces. While
multi-view clustering algorithms were proposed, the uncertainty quantification
remains difficult --- a particular challenge is in the high complexity of
estimating the cluster assignment probability under each view, or/and to
efficiently share information across views. In this article, we propose an
empirical Bayes approach --- viewing the similarity matrices generated over
subspaces as rough first-stage estimates for co-assignment probabilities, in
its Kullback-Leibler neighborhood we obtain a refined low-rank soft cluster
graph, formed by the pairwise product of simplex coordinates. Interestingly,
each simplex coordinate directly encodes the cluster assignment uncertainty.
For multi-view clustering, we equip each similarity matrix with a mixed
membership over a small number of latent views, leading to effective dimension
reduction. With a high model flexibility, the estimation can be...

"Latent Simplex Position Model: High Dimensional Multi-view Clustering with Uncertainty Quantification",
Leo L Duan
