Infrastructure-as-a-Service (IaaS) cloud providers hide available interfaces for VM placement and migration, CPU capping, memory ballooning, page sharing, and I/O throttling, limiting the ways in which applications can optimally configure resources or respond to dynamically shifting workloads. Given these interfaces, applications could migrate VMs in response to diurnal workloads or changing prices, adjust resources in response to load changes, and so on. This paper proposes a new abstraction that we call Library Cloud and that allows users to customize the diverse available cloud resources to best serve their applications. We built a prototype of a Library Cloud that we call the Supercloud. The Supercloud encapsulates applications in a virtual cloud under users' full control, and can incorporate one or more availability zones within a cloud provider or across different providers. The Supercloud provides virtual machine, storage, and networking complete with a full set of management operations, allowing applications to optimize performance. In this paper, we demonstrate various innovations enabled by the Library Cloud.
Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents Apache REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning framework, a distributed in-memory caching system, and a port of the CORFU system. REEF is also currently an Apache top-level project that has attracted contributors from several institutions.
The advent of multi-core processors has given rise to new concurrent programming paradigms. In this context, Transactional Memory (TM) has emerged as a simple and effective synchronization paradigm, via the familiar abstraction of atomic transactions, namely in recent Intel processors with hardware support for TM (HTM). In this work we study a relevant issue with great impact on the performance of HTM. Due to the optimistic and inherently limited nature of HTM, transactions may be aborted and restarted numerous times, without progress guarantees. As a result, it is up to a scheduling software library, which regulates the HTM, to ensure progress and optimize performance. However the recent mainstream HTMs have technical limitations that prevent the adoption of known scheduling techniques: unlike software implementations of TM used in the past, existing HTMs provide limited information on the root cause of aborts. We propose Seer, a software scheduler that addresses this restriction of HTM by leveraging on a probabilistic inference that identifies the most likely conflicts, and establishes dynamic locks to serialize transactions fine-grainely. Via an extensive evaluation study, we show that Seer improves the performance of the Intel's HTM by up to 3.6x, and by 65% on average.