Top 3 Arxiv Papers Today in Databases

#1. High Throughput Push Based Storage Manager
Ye Zhu
The storage manager, as a key component of the database system, is responsible for organizing, reading, and delivering data to the execution engine for processing. According to the data serving mechanism, existing storage managers are either pull-based, incurring high latency, or push-based, leading to a high number of I/O requests when the CPU is busy. To improve these shortcomings, this thesis proposes a push-based prefetching strategy in a column-wise storage manager. The proposed strategy implements an efficient cache layer to store shared data among queries to reduce the number of I/O requests. The capacity of the cache is maintained by a time access-aware eviction mechanism. Our strategy enables the storage manager to coordinate multiple queries by merging their requests and dynamically generate an optimal read order that maximizes the overall I/O throughput. We evaluated our storage manager both over a disk-based redundant array of independent disks (RAID) and an NVM Express (NVMe) solid-state drive (SSD). With the high...
#2. Keeping Track of User Steering Actions in Dynamic Workflows
Renan Souza, Vítor Silva, José Camata, Alvaro Coutinho, Patrick Valduriez, Marta Mattoso
In long-lasting scientific workflow executions in HPC machines, computational scientists (the users in this work) often need to fine-tune several workflow parameters. These tunings are done through user steering actions that may significantly improve performance (e.g., reduce execution time) or improve the overall results. However, in executions that last for weeks, users can lose track of what has been adapted if the tunings are not properly registered. In this work, we build on provenance data management to address the problem of tracking online parameter fine-tuning in dynamic workflows steered by users. We propose a lightweight solution to capture and manage provenance of the steering actions online with negligible overhead. The resulting provenance database relates tuning data with data for domain, dataflow provenance, execution, and performance, and is available for analysis at runtime. We show how users may get a detailed view of the execution, providing insights to determine when and how to tune. We discuss the...
#3. Concurrency Protocol Aiming at High Performance of Execution and Replay for Smart Contracts
Shuaifeng Pang, Xiaodong Qi, Zhao Zhang, Cheqing Jin, Aoying Zhou
Although the emergence of the programmable smart contract makes blockchain systems easily embrace a wider range of industrial areas, how to execute smart contracts efficiently becomes a big challenge nowadays. Due to the existence of Byzantine nodes, the mechanism of executing smart contracts is quite different from that in database systems, so that existing successful concurrency control protocols in database systems cannot be employed directly. Moreover, even though smart contract execution follows a two-phase style, i.e, the miner node executes a batch of smart contracts in the first phase and the validators replay them in the second phase, existing parallel solutions only focus on the optimization in the first phase, but not including the second phase. In this paper, we propose a novel efficient concurrency control scheme which is the first one to do optimization in both phases. Specifically, (i) in the first phase, we give a variant of OCC (Optimistic Concurrency Control) protocol based on {\em batching} feature to improve...
