Top 4 Arxiv Papers Today in Databases


1.998 Mikeys
#1. Using Mapping Languages for Building Legal Knowledge Graphs from XML Files
Ademar Crotti Junior, Fabrizio Orlandi, Declan O'Sullivan, Christian Dirschl, Quentin Reul
This paper presents our experience on building RDF knowledge graphs for an industrial use case in the legal domain. The information contained in legal information systems are often accessed through simple keyword interfaces and presented as a simple list of hits. In order to improve search accuracy one may avail of knowledge graphs, where the semantics of the data can be made explicit. Significant research effort has been invested in the area of building knowledge graphs from semi-structured text documents, such as XML, with the prevailing approach being the use of mapping languages. In this paper, we present a semantic model for representing legal documents together with an industrial use case. We also present a set of use case requirements based on the proposed semantic model, which are used to compare and discuss the use of state-of-the-art mapping languages for building knowledge graphs for legal data.
more | pdf | html
Figures
None.
Tweets
arxivml: "Using Mapping Languages for Building Legal Knowledge Graphs from XML Files", Ademar Crotti Junior, Fabrizio Orland… https://t.co/dJNW6rMoXC
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

1.998 Mikeys
#2. IDEALEM: Statistical Similarity Based Data Reduction
Dongeun Lee, Alex Sim, Jaesik Choi, Kesheng Wu
Many applications such as scientific simulation, sensing, and power grid monitoring tend to generate massive amounts of data, which should be compressed first prior to storage and transmission. These data, mostly comprised of floating-point values, are known to be difficult to compress using lossless compression. A few compression methods based on lossy compression have been proposed to compress this seemingly incompressible data. Unfortunately, they are all designed to minimize the Euclidean distance between the original data and the decompressed data, which fundamentally limits compression performance. We recently proposed a new class of lossy compression based on statistical similarity, called IDEALEM, which was also provided as a software package. IDEALEM has demonstrated its performance by reducing data volume much more than state-of-the-art compression methods while preserving unique patterns of data. IDEALEM can operate in two different modes depending on the stationarity of input data. This paper presents compression...
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/PfjvPOu5D6 IDEALEM: Statistical Similarity Based Data Reduction. (arXiv:1911.06980v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : [8, 16, 32, 64, 128, 256]
Authors: 4
Total Words: 16956
Unqiue Words: 3544

1.998 Mikeys
#3. Concept-oriented model: Modeling and processing data using functions
Alexandr Savinov
We describe a new logical data model, called the concept-oriented model (COM). It uses mathematical functions as first-class constructs for data representation and data processing as opposed to using exclusively sets in conventional set-oriented models. Functions and function composition are used as primary semantic units for describing data connectivity instead of relations and relation composition (join), respectively. Grouping and aggregation are also performed by using (accumulate) functions providing an alternative to group-by and reduce operations. This model was implemented in an open source data processing toolkit examples of which are used to illustrate the model and its operations. The main benefit of this model is that typical data processing tasks become simpler and more natural when using functions in comparison to adopting sets and set operations.
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/lyJR6Xp1fp Concept-oriented model: Modeling and processing data using functions. (arXiv:1911.07225v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 11293
Unqiue Words: 2211

1.987 Mikeys
#4. Multi-Source Spatial Entity Linkage
Suela Isaj, Torben Bach Pedersen, Esteban Zimányi
Besides the traditional cartographic data sources, spatial information can also be derived from location-based sources. However, even though different location-based sources refer to the same physical world, each one has only partial coverage of the spatial entities, describe them with different attributes, and sometimes provide contradicting information. Hence, we introduce the spatial entity linkage problem, which finds which pairs of spatial entities belong to the same physical spatial entity. Our proposed solution (QuadSky) starts with a spatial blocking technique (QuadFlex) that creates blocks of nearby spatial entities with the time complexity of the quadtree algorithm. After pairwise comparing the spatial entities in the same block, we propose the SkyRank algorithm that ranks the compared pairs using Pareto optimality. We introduce the SkyEx-* family of algorithms that can classify the pairs with 0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs and 0.87 precision and 0.6 recall for a...
more | pdf | html
Figures
None.
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 3
Total Words: 0
Unqiue Words: 0

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 225,763 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 225,763 papers.