##### #1. Platform-Agnostic Steal-Time Measurement in a Guest Operating System
###### Javier Verdu, Juan Jose Costa, Beatriz Otero, Eva Rodriguez, Alex Pajuelo, Ramon Canal
Steal time is a key performance metric for applications executed in a virtualized environment. Steal time measures the amount of time the processor is preempted by code outside the virtualized environment. This, in turn, allows to compute accurately the execution time of an application inside a virtual machine (i.e. it eliminates the time the virtual machine is suspended). Unfortunately, this metric is only available in particular scenarios in which the host and the guest OS are tightly coupled. Typical examples are the Xen hypervisor and Linux-based guest OSes. In contrast, in scenarios where the steal time is not available inside the virtualized environment, performance measurements are, most often, incorrect. In this paper, we introduce a novel and platform agnostic approach to calculate this steal time within the virtualized environment and without the cooperation of the host OS. The theoretical execution time of a deterministic microbenchmark is compared to its execution time in a virtualized environment. When factoring in...
##### #2. TWA - Ticket Locks Augmented with a Waiting Array
###### Dave Dice, Alex Kogan
The classic ticket lock consists of ticket and grant fields. Arriving threads atomically fetch-and-increment ticket and then wait for grant to become equal to the value returned by the fetch-and-increment primitive, at which point the thread holds the lock. The corresponding unlock operation simply increments grant. This simple design has short code paths and fast handover (transfer of ownership) under light contention, but may suffer degraded scalability under high contention when multiple threads busy wait on the grant field -- so-called global spinning. We propose a variation on ticket locks where long-term waiting threads wait on locations in a waiting array instead of busy waiting on the grant field. The single waiting array is shared among all locks. Short-term waiting is accomplished in the usual manner on the grant field. The resulting algorithm, TWA, improves on ticket locks by limiting the number of threads spinning on the grant field at any given time, reducing the number of remote caches requiring invalidation from...
##### #3. Comparative Study of Virtual Machines and Containers for DevOps Developers
###### Sumit Maheshwari, Saurabh Deochake, Ridip De, Anish Grover
In this work, we plan to develop a system to compare virtual machines with container technology. We would devise ways to measure the administrator effort of containers vs. Virtual Machines (VMs). Metrics that will be tested against include human efforts required, ease of migration, resource utilization and ease of use using containers and virtual machines.
##### #4. DurableFS: A File System for Persistent Memory
###### Chandan Kalita, Gautam Barua, Priya Sehgal
With the availability of hybrid DRAM-NVRAM memory on the memory bus of CPUs, a number of file systems on NVRAM have been designed and implemented. In this paper we present the design and implementation of a file system on NVRAM called DurableFS, which provides atomicity and durability of file operations to applications. Due to the byte level random accessibility of memory, it is possible to provide these guarantees without much overhead. We use standard techniques like copy on write for data, and a redo log for metadata changes to build an efficient file system which provides durability and atomicity guarantees at the time a file is closed. Benchmarks on the implementation shows that there is only a 7 %degradation in performance due to providing these guarantees.
##### #5. BRAVO - Biased Locking for Reader-Writer Locks
###### David Dice, Alex Kogan
##### #6. New Analysis Techniques for Supporting Hard Real-Time Sporadic DAG Task Systems on Multiprocessors
###### Zheng Dong, Cong Liu
The scheduling and schedulability analysis of real-time directed acyclic graph (DAG) task systems have received much recent attention. The DAG model can accurately represent intra-task parallelim and precedence constraints existing in many application domains. Existing techniques show that analyzing the DAG model is fundamentally more challenging compared to the ordinary sporadic task model, due to the complex intra-DAG precedence constraints which may cause rather pessimistic schedulability loss. However,such increased loss is counter-intuitive because the DAG structure shall better exploit the parallelism provided by the multiprocessor platform. Our observation is that the intra-DAG precedence constraints, if not carefully considered by the scheduling algorithm, may cause very unpredictable execution behaviors of subtasks in a DAG and further cause pessimistic analysis. In this paper, we present a set of novel scheduling and analysis techniques for better supporting hard real-time sporadic DAG tasks on multiprocessors, through...
##### #7. A Spin-based model checking for the simple concurrent program on a preemptive RTOS
###### Chen-Kai Lin, Ching-Chun, Huang, Bow-Yaw Wang
We adapt an existing preemptive scheduling model of RTOS kernel by eChronos from machine-assisted proof to Spin-based model checker. The model we constructed can be automatically verified rather than formulating proofs by hand. Moreover, we look into the designs of a Linux-like real-time kernel--Piko/RT and the specification of ARMv7-M architecture to reconstruct the model, and use LTL to specify a simple concurrent programs--consumer/producer problem during the development stage of the kernel. We show that under the preemptive scheduling and the mechanism of ARMv7-M, the program will not suffer from race condition, starvation, and deadlock.
##### #8. Profiling and Improving the Duty-Cycling Performance of Linux-based IoT Devices
###### Immanuel Amirtharaj, Tai Groot, Behnam Dezfouli
Minimizing the energy consumption of Linux-based devices is an essential step towards their wide deployment in various IoT scenarios. Energy saving methods such as duty-cycling aim to address this constraint by limiting the amount of time the device is powered on. In this work we study and improve the amount of time a Linux-based IoT device is powered on to accomplish its tasks. We analyze the processes of system boot up and shutdown on two platforms, the Raspberry Pi 3 and Zero Wireless, and enhance duty-cycling performance by identifying and disabling time consuming or unnecessary units initialized in the userspace. We also study whether SD card speed and SD card capacity utilization affect boot up duration and energy consumption. In addition, we propose Pallex, a parallel execution framework built on top of the \texttt{systemd init} system to run a user application concurrently with userspace initialization. We validate the performance impact of Pallex when applied to various IoT application scenarios: (i) capturing an image,...
##### #9. Real-time Linux communications: an evaluation of the Linux communication stack for real-time robotic applications
###### Carlos San Vicente Gutiérrez, Lander Usategui San Juan, Irati Zamalloa Ugarte, Víctor Mayoral Vilches
As robotics systems become more distributed, the communications between different robot modules play a key role for the reliability of the overall robot control. In this paper, we present a study of the Linux communication stack meant for real-time robotic applications. We evaluate the real-time performance of UDP based communications in Linux on multi-core embedded devices as test platforms. We prove that, under an appropriate configuration, the Linux kernel greatly enhances the determinism of communications using the UDP protocol. Furthermore, we demonstrate that concurrent traffic disrupts the bounded latencies and propose a solution by separating the real-time application and the corresponding interrupt in a CPU.
##### #10. Dependency Graph Approach for Multiprocessor Real-Time Synchronization
###### Jian-Jia Chen, Georg von der Brüggen, Junjie Shi, Niklas Uete
Over the years, many multiprocessor locking protocols have been designed and analyzed. However, the performance of these protocols highly depends on how the tasks are partitioned and prioritized and how the resources are shared locally and globally. This paper answers a few fundamental questions when real-time tasks share resources in multiprocessor systems. We explore the fundamental difficulty of the multiprocessor synchronization problem and show that a very simplified version of this problem is ${\mathcal NP}$-hard in the strong sense regardless of the number of processors and the underlying scheduling paradigm. Therefore, the allowance of preemption or migration does not reduce the computational complexity. For the positive side, we develop a dependency-graph approach, that is specifically useful for frame-based real-time tasks, in which all tasks have the same period and release their jobs always at the same time. We present a series of algorithms with speedup factors between $2$ and $3$ under semi-partitioned scheduling. We...
