We demonstrate applications of the Gaussian process-based landmarking
algorithm proposed in [T. Gao, S.Z. Kovalsky, I. Daubechies 2018] to geometric
morphometrics, a branch of evolutionary biology centered at the analysis and
comparisons of anatomical shapes, and compares the automatically sampled
landmarks with the "ground truth" landmarks manually placed by evolutionary
anthropologists; the results suggest that Gaussian process landmarks perform
equally well or better, in terms of both spatial coverage and downstream
statistical analysis. We provide a detailed exposition of numerical procedures
and feature filtering algorithms for computing high-quality and semantically
meaningful diffeomorphisms between disk-type anatomical surfaces.

A Matlab implementation accompanying the paper "Gaussian Process Landmarking on Manifolds".

Massive numbers of meta-analysis studies are being published. A Google
Scholar search of -systematic review and meta-analysis- returns about 1.8
million hits, July 2018. There is a need to have some way to judge the
reliability of a positive claim made in a meta-analysis that uses observational
studies. Our idea is to examine the quality of the observational studies used
in the meta-analysis and to examine the heterogeneity of those studies. We
provide background information and examples: a listing of negative studies, a
simulation of p-value plots, and three examples of p-value plots.

As the number of contributors to online peer-production systems grows, it
becomes increasingly important to predict whether the edits that users make
will eventually be beneficial to the project. Existing solutions either rely on
a user reputation system or consist of a highly specialized predictor that is
tailored to a specific peer-production system. In this work, we explore a
different point in the solution space that goes beyond user reputation but does
not involve any content-based feature of the edits. We view each edit as a game
between the editor and the component of the project. We posit that the
probability that an edit is accepted is a function of the editor's skill, of
the difficulty of editing the component and of a user-component interaction
term. Our model is broadly applicable, as it only requires observing data about
who makes an edit, what the edit affects and whether the edit survives or not.
We apply our model on Wikipedia and the Linux kernel, two examples of
large-scale peer-production systems, and we seek to...

The code for the paper "Can Who-Edits-What Predict Edit Survival?"

The real estate market is exposed to many fluctuations in prices, because of
existing correlations with many variables, some of which cannot be controlled
or might even be unknown. Housing prices can increase rapidly (or in some
cases, also drop very fast), yet the numerous listings available online where
houses are sold or rented are not likely to be updated that often. In some
cases, individuals interested in selling a house (or apartment) might include
it in some online listing, and forget about updating the price. In other cases,
some individuals might be interested in deliberately setting a price below the
market price in order to sell the home faster, for various reasons.
In this paper we aim at developing a machine learning application that
identifies opportunities in the real estate market in real time, i.e., houses
that are listed with a price substantially below the market price. This program
can be useful for investors interested in the housing market.
The application is formally implemented as a regression problem,...

Hierarchical random effect models are used for different purposes in clinical
research and other areas. In general, the main focus is on population
parameters related to the expected treatment effects or group differences among
all units of an upper level (e.g. subjects in many settings). Optimal design
for estimation of population parameters are well established for many models.
However, optimal designs for the prediction for the individual units may be
different. Several settings are identiffed in which individual prediction may
be of interest. In this paper we determine optimal designs for the individual
predictions, e.g. in multi-center trials, and compare them to a conventional
balanced design with respect to treatment allocation. Our investigations show,
that balanced designs are far from optimal if the treatment effects vary
strongly as compared to the residual error and more subjects should be
recruited to the active (new) treatment in multi-center trials. Nevertheless,
effciency loss may be limited resulting in a moderate...

We have seen a massive growth of online experiments at LinkedIn, and in
industry at large. It is now more important than ever to create an intelligent
A/B platform that can truly democratize A/B testing by allowing everyone to
make quality decisions, regardless of their skillset. With the tremendous
knowledge base created around experimentation, we are able to mine through
historical data, and discover the most common causes for biased experiments. In
this paper, we share four of such common causes, and how we build into our A/B
testing platform the automatic detection and diagnosis of such root causes.
These root causes range from design-imposed bias, self-selection bias, novelty
effect and trigger-day effect. We will discuss in detail what each bias is and
the scalable algorithm we developed to detect the bias. Surfacing up the
existence and root cause of bias automatically for every experiment is an
important milestone towards intelligent A/B testing.

Particle filters contain the promise of fully nonlinear data assimilation.
They have been applied in numerous science areas, but their application to the
geosciences has been limited due to their inefficiency in high-dimensional
systems in standard settings. However, huge progress has been made, and this
limitation is disappearing fast due to recent developments in proposal
densities, the use of ideas from (optimal) transportation, the use of
localisation and intelligent adaptive resampling strategies. Furthermore,
powerful hybrids between particle filters and ensemble Kalman filters and
variational methods have been developed. We present a state of the art
discussion of present efforts of developing particle filters for highly
nonlinear geoscience state-estimation problems with an emphasis on atmospheric
and oceanic applications, including many new ideas, derivations, and
unifications, highlighting hidden connections, and generating a valuable tool
and guide for the community. Initial experiments show that particle filters can
be...

In this paper, the time dynamics of the daily means of wind speed measured in
complex mountainous regions are investigated. For 293 measuring stations
distributed over all Switzerland, the Fisher information measure and the
Shannon entropy power are calculated. The results reveal a clear relationship
between the computed measures and both the elevation of the wind stations and
the slope of the measuring sites. In particular, the Shannon entropy power and
the Fisher information measure have their highest (respectively lowest) values
in the Alps mountains, where the time dynamics of wind speed follows a more
disordered pattern. The spatial mapping of the calculated quantities allows the
identification of two regions within Switzerland characterized by more or less
organization/order in the time dynamics of wind speed, which is in agreement
with the topography of the Swiss territory. The present study could contribute
to a better characterization of the temporal dynamics of wind speed in complex
mountainous terrains.

Improved communication systems, shrinking battery sizes and the price drop of
tracking devices have led to an increasing availability of trajectory tracking
data. These data are often analyzed to understand animals behavior using
mixture-type model. Due to their straightforward implementation and efficiency,
hidden Markov mod- els are generally used but they are based on assumptions
that are rarely verified on real data. In this work we propose a new model
based on the Logistic-Normal process. Due to a new formalization and the way we
specify the coregionalization matrix of the associated multivariate Gaussian
process, we show that our model, differently from other proposals, is invariant
with respect to the choice of the reference element and the ordering of the
probability vectors components. We estimate the model under a Bayesian
framework, using an approximation of the Gaussian process needed to avoid
impractical computational time. After a simulation study, where we show the
ability of the model to retrieve the parameters...

Process capability index (PCI) is a commonly used statistic to measure
ability of a process to operate within the given specifications or to produce
products which meet the required quality specifications. PCI can be univariate
or multivariate depending upon the number of process specifications or quality
characteristics of interest. Most PCIs make distributional assumptions which
are often unrealistic in practice.
This paper proposes a new multivariate non-parametric process capability
index. This index can be used when distribution of the process or quality
parameters is either unknown or does not follow commonly used distributions
such as multivariate normal.

