Top 10 Arxiv Papers Today in Databases


0.0 Mikeys
#1. GPU-based Commonsense Paradigms Reasoning for Real-Time Query Answering and Multimodal Analysis
Nguyen Ha Tran, Erik Cambria
We utilize commonsense knowledge bases to address the problem of real- time multimodal analysis. In particular, we focus on the problem of multimodal sentiment analysis, which consists in the simultaneous analysis of different modali- ties, e.g., speech and video, for emotion and polarity detection. Our approach takes advantages of the massively parallel processing power of modern GPUs to enhance the performance of feature extraction from different modalities. In addition, in order to ex- tract important textual features from multimodal sources we generate domain-specific graphs based on commonsense knowledge and apply GPU-based graph traversal for fast feature detection. Then, powerful ELM classifiers are applied to build the senti- ment analysis model based on the extracted features. We conduct our experiments on the YouTube dataset and achieve an accuracy of 78% which outperforms all previous systems. In term of processing speed, our method shows improvements of several orders of magnitude for feature extraction compared to...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 37496
Unqiue Words: 6453

0.0 Mikeys
#2. Budget-aware Online Task Assignment in Spatial Crowdsourcing
Jia-Xu Liu, Ke Xu
The prevalence of mobile internet techniques stimulates the emergence of various spatial crowdsourcing applications. Certain of the applications serve for requesters, budget providers, who submit a batch of tasks and a fixed budget to platform with the desire to search suitable workers to complete the tasks in maximum quantity. Platform lays stress on optimizing assignment strategies on seeking less budget-consumed worker-task pairs to meet requesters' demands. Existing research on the task assignment with budget constraint mostly focuses on static offline scenarios, where the spatiotemporal information of all workers and tasks is known in advance. However, workers usually appear dynamically on real spatial crowdsourcing platforms, where existing solutions can hardly handle it. In this paper, we formally define a novel problem Budget-aware Online task Assignment(BOA) in spatial crowdsourcing applications. BOA aims to maximize the number of assigned worker- task pairs under a budget constraint where workers appear dynamically on...
more | pdf | html
Figures
Tweets
ComputerPapers: Budget-aware Online Task Assignment in Spatial Crowdsourcing. https://t.co/DPqX3M9sfI
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 8489
Unqiue Words: 2172

0.0 Mikeys
#3. Towards Self-Tuning Parameter Servers
Chris Liu, Pengfei Zhang, Bo Tang, Hang Shen, Lei Zhu, Ziliang Lai, Eric Lo
Recent years, many applications have been driven advances by the use of Machine Learning (ML). Nowadays, it is common to see industrial-strength machine learning jobs that involve millions of model parameters, terabytes of training data, and weeks of training. Good efficiency, i.e., fast completion time of running a specific ML job, therefore, is a key feature of a successful ML system. While the completion time of a long- running ML job is determined by the time required to reach model convergence, practically that is also largely influenced by the values of various system settings. In this paper, we contribute techniques towards building self-tuning parameter servers. Parameter Server (PS) is a popular system architecture for large-scale machine learning systems; and by self-tuning we mean while a long-running ML job is iteratively training the expert-suggested model, the system is also iteratively learning which system setting is more efficient for that job and applies it online. While our techniques are general enough to...
more | pdf | html
Figures
Tweets
nmfeeds: [O] https://t.co/EhIcUw4nM0 Towards Self-Tuning Parameter Servers. Recent years, many applications have been driven advanc...
ComputerPapers: Towards Self-Tuning Parameter Servers. https://t.co/emQImckFpO
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 7
Total Words: 11785
Unqiue Words: 3338

0.0 Mikeys
#4. Plato: Approximate Analytics over Compressed Time Series with Tight Deterministic Error Guarantees
Etienne Boursier, Jaqueline J. Brito, Chunbin Lin, Yannis Papakonstantinou
Plato provides sound and tight deterministic error guarantees for approximate analytics over compressed time series. Plato supports expressions that are compositions of the (commonly used in time series analytics) linear algebra operators over vectors, along with arithmetic operators. Such analytics can express common statistics (such as correlation and cross-correlation) that may combine multiple time series. The time series are segmented either by fixed-length segmentation or by (more effective) variable-length segmentation. Each segment (i) is compressed by an estimation function that approximates the actual values and is coming from a user-chosen estimation function family, and (ii) is associated with one to three (depending on the case) precomputed error measures. Then Plato is able to provide tight deterministic error guarantees for the analytics over the compressed time series. This work identifies two broad estimation function family groups. The Vector Space (VS) family and the presently defined Linear Scalable Family...
more | pdf | html
Figures
Tweets
M157q_News_RSS: Plato: Approximate Analytics over Compressed Time Series with Tight Deterministic Error Guarantees. (arXiv:1808.0487 https://t.co/9GLp0tQt6O
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 14003
Unqiue Words: 3253

0.0 Mikeys
#5. QR2: A Third-party Query Reranking Service Over Web Databases
Yeshwanth D. Gunasekaran, Abolfazl Asudeh, Sona Hasani, Nan Zhang, Ali Jaoua, Gautam Das
The ranked retrieval model has rapidly become the de-facto way for search query processing in web databases. Despite the extensive efforts on designing better ranking mechanisms, in practice, many such databases fail to address the diverse and sometimes contradicting preferences of users. In this paper, we present QR2, a third-party service that uses nothing but the public search interface of a web database and enables the on-the-fly processing of queries with any user-specified ranking functions, no matter if the ranking function is supported by the database or not.
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 3142
Unqiue Words: 1127

0.0 Mikeys
#6. EmbNum: Semantic labeling for numerical values with deep metric learning
Phuc Nguyen, Khai Nguyen, Ryutaro Ichise, Hideaki Takeda
Semantic labeling is a task of matching unknown data source to labeled data sources. The semantic labels could be properties, classes in knowledge bases or labeled data are manually annotated by domain experts. In this paper, we presentEmbNum, a novel approach to match numerical columns from different table data sources. We use a representation network architecture consisting of triplet network and convolutional neural network to learn a mapping function from numerical columns toa transformed space. In this space, the Euclidean distance can be used to measure "semantic similarity" of two columns. Our experiments onCity-Data and Open-Data demonstrate thatEmbNumachieves considerable improvements in comparison with the state-of-the-art methods in effectiveness and efficiency.
more | pdf | html
Figures
Tweets
HubBucket: RT @arxiv_org: EmbNum: Semantic labeling for numerical values with deep metric learning. https://t.co/eAaEtEgoj9 https://t.co/CHjEtXfdNR
HubBucket: RT @arxiv_org: EmbNum: Semantic labeling for numerical values with deep metric learning. https://t.co/eAaEtEgoj9 https://t.co/CHjEtXfdNR
akdm_bot: RT @arxiv_org: EmbNum: Semantic labeling for numerical values with deep metric learning. https://t.co/eAaEtEgoj9 https://t.co/CHjEtXfdNR
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 4
Total Words: 5708
Unqiue Words: 1856

0.0 Mikeys
#7. Integration of Relational and Graph Databases Functionally
Jaroslav Pokorny
A significant category of NoSQL approaches is known as graph da-tabases. They are usually represented by one property graph. We introduce a functional approach to modelling relations and property graphs. Single-valued and multivalued functions will be sufficient in this case. Then, a typed {\lambda}-calculus, i.e., the language of lambda terms, will be used as a data manipulation lan-guage. Some integration options at the query language level are discussed.
more | pdf | html
Figures
None.
Tweets
ComputerPapers: Integration of Relational and Graph Databases Functionally. https://t.co/0ZhVkB8AZB
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 1
Total Words: 1542
Unqiue Words: 796

0.0 Mikeys
#8. Ektelo: A Framework for Defining Differentially-Private Computations
Dan Zhang, Ryan McKenna, Ios Kotsogiannis, Michael Hay, Ashwin Machanavajjhala, Gerome Miklau
The adoption of differential privacy is growing but the complexity of designing private, efficient and accurate algorithms is still high. We propose a novel programming framework and system, Ektelo, for implementing both existing and new privacy algorithms. For the task of answering linear counting queries, we show that nearly all existing algorithms can be composed from operators, each conforming to one of a small number of operator classes. While past programming frameworks have helped to ensure the privacy of programs, the novelty of our framework is its significant support for authoring accurate and efficient (as well as private) programs. After describing the design and architecture of the Ektelo system, we show that Ektelo is expressive, allows for safer implementations through code reuse, and that it allows both privacy novices and experts to easily design algorithms. We demonstrate the use of Ektelo by designing several new state-of-the-art algorithms.
more | pdf | html
Figures
None.
Tweets
M157q_News_RSS: Ektelo: A Framework for Defining Differentially-Private Computations. (arXiv:1808.03555v1 [cs.DB]) https://t.co/TBUBETjfjO The adoption of d
ComputerPapers: Ektelo: A Framework for Defining Differentially-Private Computations. https://t.co/GGXEBzSDQ4
Github
Repository: dpcomp_core
User: dpcomp-org
Language: Python
Stargazers: 12
Subscribers: 11
Forks: 2
Open Issues: 2
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 6
Total Words: 16589
Unqiue Words: 3858

0.0 Mikeys
#9. An Approach to Handle Big Data Warehouse Evolution
Darja Solodovnikova, Laila Niedrite
One of the purposes of Big Data systems is to support analysis of data gathered from heterogeneous data sources. Since data warehouses have been used for several decades to achieve the same goal, they could be leveraged also to provide analysis of data stored in Big Data systems. The problem of adapting data warehouse data and schemata to changes in these requirements as well as data sources has been studied by many researchers worldwide. However, innovative methods must be developed also to support evolution of data warehouses that are used to analyze data stored in Big Data systems. In this paper, we propose a data warehouse architecture that allows to perform different kinds of analytical tasks, including OLAP-like analysis, on big data loaded from multiple heterogeneous data sources with different latency and is capable of processing changes in data sources as well as evolving analysis requirements. The operation of the architecture is highly based on the metadata that are outlined in the paper.
more | pdf | html
Figures
Tweets
ComputerPapers: An Approach to Handle Big Data Warehouse Evolution. https://t.co/0chxYZFHOH
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 2
Total Words: 1762
Unqiue Words: 729

0.0 Mikeys
#10. Improve3C: Data Cleaning on Consistency and Completeness with Currency
Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Jianzhong Li, Hong Gao
Data quality plays a key role in big data management today. With the explosive growth of data from a variety of sources, the quality of data is faced with multiple problems. Motivated by this, we study the multiple data quality improvement on completeness, consistency and currency in this paper. For the proposed problem, we introduce a 4-step framework, named Improve3C, for detection and quality improvement on incomplete and inconsistent data without timestamps. We compute and achieve a relative currency order among records derived from given currency constraints, according to which inconsistent and incomplete data can be repaired effectively considering the temporal impact. For both effectiveness and efficiency consideration, we carry out inconsistent repair ahead of incomplete repair. Currency-related consistency distance is defined to measure the similarity between dirty records and clean ones more accurately. In addition, currency orders are treated as an important feature in the training process of incompleteness repair. The...
more | pdf | html
Figures
Tweets
Github
None.
Youtube
None.
Other stats
Sample Sizes : None.
Authors: 5
Total Words: 12528
Unqiue Words: 2822

About

Assert is a website where the best academic papers on arXiv (computer science, math, physics), bioRxiv (biology), BITSS (reproducibility), EarthArXiv (earth science), engrXiv (engineering), LawArXiv (law), PsyArXiv (psychology), SocArXiv (social science), and SportRxiv (sport research) bubble to the top each day.

Papers are scored (in real-time) based on how verifiable they are (as determined by their Github repos) and how interesting they are (based on Twitter).

To see top papers, follow us on twitter @assertpub_ (arXiv), @assert_pub (bioRxiv), and @assertpub_dev (everything else).

To see beautiful figures extracted from papers, follow us on Instagram.

Tracking 72,995 papers.

Search
Sort results based on if they are interesting or reproducible.
Interesting
Reproducible
Categories
All
Astrophysics
Cosmology and Nongalactic Astrophysics
Earth and Planetary Astrophysics
Astrophysics of Galaxies
High Energy Astrophysical Phenomena
Instrumentation and Methods for Astrophysics
Solar and Stellar Astrophysics
Condensed Matter
Disordered Systems and Neural Networks
Mesoscale and Nanoscale Physics
Materials Science
Other Condensed Matter
Quantum Gases
Soft Condensed Matter
Statistical Mechanics
Strongly Correlated Electrons
Superconductivity
Computer Science
Artificial Intelligence
Hardware Architecture
Computational Complexity
Computational Engineering, Finance, and Science
Computational Geometry
Computation and Language
Cryptography and Security
Computer Vision and Pattern Recognition
Computers and Society
Databases
Distributed, Parallel, and Cluster Computing
Digital Libraries
Discrete Mathematics
Data Structures and Algorithms
Emerging Technologies
Formal Languages and Automata Theory
General Literature
Graphics
Computer Science and Game Theory
Human-Computer Interaction
Information Retrieval
Information Theory
Machine Learning
Logic in Computer Science
Multiagent Systems
Multimedia
Mathematical Software
Numerical Analysis
Neural and Evolutionary Computing
Networking and Internet Architecture
Other Computer Science
Operating Systems
Performance
Programming Languages
Robotics
Symbolic Computation
Sound
Software Engineering
Social and Information Networks
Systems and Control
Economics
Econometrics
General Economics
Theoretical Economics
Electrical Engineering and Systems Science
Audio and Speech Processing
Image and Video Processing
Signal Processing
General Relativity and Quantum Cosmology
General Relativity and Quantum Cosmology
High Energy Physics - Experiment
High Energy Physics - Experiment
High Energy Physics - Lattice
High Energy Physics - Lattice
High Energy Physics - Phenomenology
High Energy Physics - Phenomenology
High Energy Physics - Theory
High Energy Physics - Theory
Mathematics
Commutative Algebra
Algebraic Geometry
Analysis of PDEs
Algebraic Topology
Classical Analysis and ODEs
Combinatorics
Category Theory
Complex Variables
Differential Geometry
Dynamical Systems
Functional Analysis
General Mathematics
General Topology
Group Theory
Geometric Topology
History and Overview
Information Theory
K-Theory and Homology
Logic
Metric Geometry
Mathematical Physics
Numerical Analysis
Number Theory
Operator Algebras
Optimization and Control
Probability
Quantum Algebra
Rings and Algebras
Representation Theory
Symplectic Geometry
Spectral Theory
Statistics Theory
Mathematical Physics
Mathematical Physics
Nonlinear Sciences
Adaptation and Self-Organizing Systems
Chaotic Dynamics
Cellular Automata and Lattice Gases
Pattern Formation and Solitons
Exactly Solvable and Integrable Systems
Nuclear Experiment
Nuclear Experiment
Nuclear Theory
Nuclear Theory
Physics
Accelerator Physics
Atmospheric and Oceanic Physics
Applied Physics
Atomic and Molecular Clusters
Atomic Physics
Biological Physics
Chemical Physics
Classical Physics
Computational Physics
Data Analysis, Statistics and Probability
Physics Education
Fluid Dynamics
General Physics
Geophysics
History and Philosophy of Physics
Instrumentation and Detectors
Medical Physics
Optics
Plasma Physics
Popular Physics
Physics and Society
Space Physics
Quantitative Biology
Biomolecules
Cell Behavior
Genomics
Molecular Networks
Neurons and Cognition
Other Quantitative Biology
Populations and Evolution
Quantitative Methods
Subcellular Processes
Tissues and Organs
Quantitative Finance
Computational Finance
Economics
General Finance
Mathematical Finance
Portfolio Management
Pricing of Securities
Risk Management
Statistical Finance
Trading and Market Microstructure
Quantum Physics
Quantum Physics
Statistics
Applications
Computation
Methodology
Machine Learning
Other Statistics
Statistics Theory
Feedback
Online
Stats
Tracking 72,995 papers.