Statistical modeling of rainfall is an important challenge in meteorology,
particularly from the perspective of rainfed agriculture where a proper
assessment of the future availability of rainwater is necessary. The
probability models mostly used for this purpose are exponential, gamma, Weibull
and lognormal distributions, where the unknown model parameters are routinely
estimated using the maximum likelihood estimator (MLE). However, presence of
outliers or extreme observations is quite common in rainfall data and the MLEs
being highly sensitive to them often leads to spurious inference. In this
paper, we discuss a robust parameter estimation approach based on the minimum
density power divergence estimators (MDPDEs) which provides a class of
estimates through a tuning parameter including the MLE as a special case. The
underlying tuning parameter controls the trade-offs between efficiency and
robustness of the resulting inference; we also discuss a procedure for
data-driven optimal selection of this tuning parameter as well as...

We study the effect of the introduction of university tuition fees on the
enrollment behavior of students in Germany. For this, an appropriate
Lasso-technique is crucial in order to identify the magnitude and significance
of the effect due to potentially many relevant controlling factors and only a
short time frame where fees existed. We show that a post-double selection
strategy combined with stability selection determines a significant negative
impact of fees on student enrollment and identifies relevant variables. This is
in contrast to previous empirical studies and a plain linear panel regression
which cannot detect any effect of tuition fees in this case. In our study, we
explicitly deal with data challenges in the response variable in a transparent
way and provide respective robust results. Moreover, we control for spatial
cross-effects capturing the heterogeneity in the introduction scheme of fees
across federal states ("Bundesl\"ander"), which can set their own educational
policy. We also confirm the validity of our Lasso...

A delay between the occurrence and the reporting of events often has
practical implications such as for the amount of capital to hold for insurance
companies, or for taking preventive actions in case of infectious diseases. The
accurate estimation of the number of incurred but not (yet) reported events
forms an essential part of properly dealing with this phenomenon. We review the
current practice for analysing such data and we present a flexible regression
framework to jointly estimate the occurrence and reporting of events from data
at daily level. By linking this setting to an incomplete data problem,
estimation is performed by the expectation-maximization algorithm. The
resulting method is elegant, easy to understand and implement, and provides
refined forecasts on a daily level. The proposed methodology is applied to a
European general liability portfolio in insurance.

Civil registration vital statistics (CRVS) data are used to produce national
estimates of maternal mortality, but are often subject to substantial reporting
errors due to misclassification of maternal deaths. The accuracy of CRVS
systems can be assessed by comparing CRVS-based counts of maternal and
non-maternal deaths to those obtained from specialized studies, which are
rigorous assessments of maternal mortality for a given country-period. We
developed a Bayesian bivariate random walk model to estimate sensitivity and
specificity of the reporting on maternal mortality in CRVS data, and associated
CRVS adjustment factors. The model was fitted to a global data set of CRVS and
specialized study data. Validation exercises suggest that the model performs
well in terms of predicting CRVS-based proportions of maternal deaths for
country-periods without specialized studies. The new model is used by the UN
Maternal Mortality Inter-Agency Group to account for misclassification errors
when estimating maternal mortality using CRVS data.

Analysis of technical efficiency is an important tool in management of public
libraries. We assess the efficiency of 4660 public libraries established by
municipalities in the Czech Republic in the year 2017. For this purpose, we
utilize the data envelopment analysis (DEA) based on the Chebyshev distance. We
pay special attention to the operating environment and find that the efficiency
scores significantly depend on the population of the municipality and distance
to the municipality with extended powers. To remove the effect of the operating
environment, we perform DEA separately for categories based on the decision
tree analysis as well as categories designed by an expert.

We present analysis of anonymised admission/discharge data from insurance
provider for Saxony and Thuringia (Germany) for years 2010--2016. Study of such
data are necessary to derive a structure of healthcare system transfer network,
as no patients' transfer data are currently available. Hospital network can be
directly used as a basis for modelling of multidrug-resistant pathogen spread
allowing to study the effectiveness of disease-control strategies. In this
paper, the properties of the dataset under consideration are presented and
discussed.

