Top 4 Arxiv Papers Today in Databases

#1. Using Mapping Languages for Building Legal Knowledge Graphs from XML Files
Ademar Crotti Junior, Fabrizio Orlandi, Declan O'Sullivan, Christian Dirschl, Quentin Reul
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant research effort has been invested in the area of building knowledge graphs from semi-structured text documents, such as XML, with the prevailing approach being the use of mapping languages. In this paper, we present a semantic model for representing legal documents together with an industrial use case. We also present a set of use case requirements based on the proposed semantic model, which are used to compare and discuss the use of state-of-the-art mapping languages for building knowledge graphs for legal data.
#2. IDEALEM: Statistical Similarity Based Data Reduction
Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu
Many applications such as scientific simulation, sensing, and power grid monitoring tend to generate massive amounts of data, which should be compressed first prior to storage and transmission. These data, mostly comprised of floating-point values, are known to be difficult to compress using lossless compression. A few compression methods based on lossy compression have been proposed to compress this seemingly incompressible data. Unfortunately, they are all designed to minimize the Euclidean distance between the original data and the decompressed data, which fundamentally limits compression performance. We recently proposed a new class of lossy compression based on statistical similarity, called IDEALEM, which was also provided as a software package. IDEALEM has demonstrated its performance by reducing data volume much more than state-of-the-art compression methods while preserving unique patterns of data. IDEALEM can operate in two different modes depending on the stationarity of input data. This paper presents compression...
#3. Concept-oriented model: Modeling and processing data using functions
Alexandr Savinov
We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.
#4. Multi-Source Spatial Entity Linkage
Suela Isaj, Torben Bach Pedersen, Esteban Zimányi
Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a spatial blocking technique (QuadFlex) that creates blocks of nearby spatial entities with the time complexity of the quadtree algorithm. After pairwise comparing the spatial entities in the same block, we propose the SkyRank algorithm that ranks the compared pairs using Pareto optimality. We introduce the SkyEx-* family of algorithms that can classify the pairs with 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a...
