Top 6 Arxiv Papers Today in Databases


0.0 Mikeys
#1. Generative Datalog with Continuous Distributions
Martin Grohe, Benjamin Lucien Kaminski, Joost-Pieter Katoen, Peter Lindner
Arguing for the need to combine declarative and probabilistic programming, B\'ar\'any et al. (TODS 2017) recently introduced a probabilistic extension of Datalog as a "purely declarative probabilistic programming language." We revisit this language and propose a more principled approach towards defining its semantics. It is based on standard notions from probability theory known as stochastic kernels and Markov processes. This allows us to extend the semantics to continuous probability distributions, thereby settling an open problem posed by B\'ar\'any et al. We show that our semantics is fairly robust, allowing both parallel execution and arbitrary chase orders when evaluating a program. We cast our semantics in the framework of infinite probabilistic databases (Grohe and Lindner, ICDT 2020), and we show that the semantics remains meaningful even when the input of a probabilistic Datalog program is an arbitrary probabilistic database.
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/6op0z9yob5 Generative Datalog with Continuous Distributions. (arXiv:2001.06358v1 [cs.DB]) #databases
arxiv_cslo: Generative Datalog with Continuous Distributions https://t.co/F8PpfMrIZZ
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 18218
Unqiue Words: 3563

0.0 Mikeys
#2. Efficient Radial Pattern Keyword Search on Knowledge Graphs in Parallel
Yueji Yang, Anthony K. H. Tung
Recently, keyword search on Knowledge Graphs (KGs) becomes popular. Typical keyword search approaches aim at finding a concise subgraph from a KG, which can reflect a close relationship among all input keywords. The connection paths between keywords are selected in a way that leads to a result subgraph with a better semantic score. However, such a result may not meet user information need because it relies on the scoring function to decide what keywords to link closer. Therefore, such a result may miss close connections among some keywords on which users intend to focus. In this paper, we propose a parallel keyword search engine, called RAKS. It allows users to specify a query as two sets of keywords, namely central keywords and marginal keywords. Specifically, central keywords are those keywords on which users focus more. Their relationships are desired in the results. Marginal keywords are those less focused keywords. Their connections to the central keywords are desired. In addition, they provide additional information that...
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/6aKA974OWv Efficient Radial Pattern Keyword Search on Knowledge Graphs in Parallel. (arXiv:2001.06770v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#3. AI Data Wrangling with Associative Arrays
Jeremy Kepner, Vijay Gadepally, Hayden Jananthan, Lauren Milechin, Siddharth Samsi
The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data representations supporting the many query and analysis steps found in an AI pipeline. Rigorous mathematical representations of these data enables data translation and analysis optimization within and across steps. Associative array algebra provides a mathematical foundation that naturally describes the tabular structures and set mathematics that are the basis of databases. Likewise, the matrix operations and corresponding inference/training calculations used by neural networks are also well described by associative arrays. More surprisingly, a general denormalized form of hierarchical formats, such as XML and JSON, can be readily constructed. Finally, pivot tables, which are among the most widely used data analysis tools, naturally emerge from associative array constructors. A common foundation in...
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/1n4hrkTSXy AI Data Wrangling with Associative Arrays. (arXiv:2001.06731v1 [cs.DB]) #databases
leclercq_ub: RT @dbworld_: https://t.co/1n4hrkTSXy AI Data Wrangling with Associative Arrays. (arXiv:2001.06731v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 0
Unqiue Words: 0

0.0 Mikeys
#4. SQLFlow: A Bridge between SQL and Machine Learning
Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, Yuan Tang
Industrial AI systems are mostly end-to-end machine learning (ML) workflows. A typical recommendation or business intelligence system includes many online micro-services and offline jobs. We describe SQLFlow for developing such workflows efficiently in SQL. SQL enables developers to write short programs focusing on the purpose (what) and ignoring the procedure (how). Previous database systems extended their SQL dialect to support ML. SQLFlow (https://sqlflow.org/sqlflow ) takes another strategy to work as a bridge over various database systems, including MySQL, Apache Hive, and Alibaba MaxCompute, and ML engines like TensorFlow, XGBoost, and scikit-learn. We extended SQL syntax carefully to make the extension working with various SQL dialects. We implement the extension by inventing a collaborative parsing algorithm. SQLFlow is efficient and expressive to a wide variety of ML techniques -- supervised and unsupervised learning; deep networks and tree models; visual model explanation in addition to training and prediction; data...
more | pdf | html
Figures
Tweets
BrundageBot: SQLFlow: A Bridge between SQL and Machine Learning. Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, Yu Wang, Liang Xie, Ziyao Gao, Wenjing Zhu, Xiang Chen, Wei Yan, Mingjie Tang, and Yuan Tang https://t.co/ShIinYGV8M
arxivml: "SQLFlow: A Bridge between SQL and Machine Learning", Yi Wang, Yang Yang, Weiguo Zhu, Yi Wu, Xu Yan, Yongfeng Liu, … https://t.co/a1O3sboMKL
dbworld_: https://t.co/fn79jTa4tq SQLFlow: A Bridge between SQL and Machine Learning. (arXiv:2001.06846v1 [cs.DB]) #databases
Memoirs: SQLFlow: A Bridge between SQL and Machine Learning. https://t.co/Q2tT66LHXP
darmont_lyon2: RT @dbworld_: https://t.co/fn79jTa4tq SQLFlow: A Bridge between SQL and Machine Learning. (arXiv:2001.06846v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 14
Total Words: 3282
Unqiue Words: 1321

0.0 Mikeys
#5. Fides: Managing Data on Untrusted Infrastructure
Sujaya Maiyya, Danny Hyun Bum Cho, Divyakant Agrawal, Amr El Abbadi
Significant amounts of data are currently being stored and managed on third-party servers. It is impractical for many small scale enterprises to own their private datacenters, hence renting third-party servers is a viable solution for such businesses. But the increasing number of malicious attacks, both internal and external, as well as buggy software on third-party servers is causing clients to lose their trust in these external infrastructures. While small enterprises cannot avoid using external infrastructures, they need the right set of protocols to manage their data on untrusted infrastructures. In this paper, we propose TFCommit, a novel atomic commitment protocol that executes transactions on data stored across multiple untrusted servers. To our knowledge, TFCommit is the first atomic commitment protocol to execute transactions in an untrusted environment without using expensive Byzantine replication. Using TFCommit, we propose an auditable data management system, Fides, residing completely on untrustworthy infrastructure....
more | pdf | html
Figures
Tweets
dbworld_: https://t.co/PNqOLtNshK Fides: Managing Data on Untrusted Infrastructure. (arXiv:2001.06933v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 12840
Unqiue Words: 3000

0.0 Mikeys
#6. Efficient Bitruss Decomposition for Large-scale Bipartite Graphs
Kai Wang, Xuemin Lin, Lu Qin, Wenjie Zhang, Ying Zhang
Cohesive subgraph mining in bipartite graphs becomes a popular research topic recently. An important structure k-bitruss is the maximal cohesive subgraph where each edge is contained in at least k butterflies (i.e., (2, 2)-bicliques). In this paper, we study the bitruss decomposition problem which aims to find all the k-bitrusses for k >= 0. The existing bottom-up techniques need to iteratively peel the edges with the lowest butterfly support. In this peeling process, these techniques are time-consuming to enumerate all the supporting butterflies for each edge. To relax this issue, we first propose a novel online index -- the BE-Index which compresses butterflies into k-blooms (i.e., (2, k)-bicliques). Based on the BE-Index, the new bitruss decomposition algorithm BiT-BU is proposed, along with two batch-based optimizations, to accomplish the butterfly enumeration of the peeling process in an efficient way. Furthermore, the BiT-PC algorithm is devised which is more efficient against handling the edges with high butterfly supports....
more | pdf | html
Figures
None.
Tweets
dbworld_: https://t.co/NEQqS452lq Efficient Bitruss Decomposition for Large-scale Bipartite Graphs. (arXiv:2001.06111v1 [cs.DB]) #databases
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 11247
Unqiue Words: 2190

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 256,581 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 256,581 papers.