On-The-Fly Compute Centers II: Execution of Composed Services in Configurable Compute Centers
In this subproject we are concerned with efficiently utilizing resources within highly configurable compute centers. First, we build on the scheduling results that were achieved in the first phase of subproject C2. Secondly, we incorporate the results of the completed subproject A2, especially the work on software-defined networking (SDN). This subproject emphasizes the collaboration between theoretical and practical computer science on closely related issues. We will examine these issues by using different methods (theoretical analysis, simulation, emulation and prototyping) on different levels of abstraction.
OTF Compute Centers are intended to exploit the properties of OTF services. A characteristic of OTF services is that they are composed out of components with explicit, quantitative metda data about those compositions -- e.g., resource consumption per component, data flows between components, etc. OTF Centers hence should be able to exploit those meta data to improve service performance and system utilization. Moreover, OTF centers will be huge compute centers and those are typically highly heterogeneous, having various types of computation units and persistent storage units; if a service provides meta data about its performance on different types of computation units, such information is leveraged for better scheduling decisions as well. They also have one or more networks that connect these resources with each other. An OTF service can be provided by a single or several interacting geographically or organizationally distributed OTF Compute Centers and, if necessary, they are supplemented by temporarily rented resources from the cloud.
We will develop and analyze scheduling algorithms that consider the characteristics of OTF services on the one hand and OTF Compute Centers on the other. The subproject is organized by the granularity of the compute centers, represented by the following working areas.
OTF Compute Center
In this area, we investigate architecture and scheduling in a single OTF Compute Center as well as create a prototype and perform simulations.
We will develop an additional component of the OTF Compute Center, the oracle, that gives information about the near future of job arrivals, job characteristics, system avilability, etc. We will change the architecture of an OTF Compute Center to incorporate such an oracle. The focus lies on the APIs between different scheduling plug-ins and the oracle as well as the APIs responsible for the communication of information from the oracle. The goal is an architecture description of an OTF Compute Center including all APIs that are necessary for the implementation.
In the scheduling area, we deal with the execution of OTF services in a single compute center composed of heterogeneous and configurable devices and networks. Here, we approach scheduling in heterogeneous systems with a focus on scheduling with setup times, which gains more and more interest with the integration of FPGAs in modern compute centers. For the resource management in such compute centers, we consider limitations on data rate or overall energy. The data rate limitations originate in bottlenecks in the actual network (simple buses or complex switching networks); energy limitations may originate in high temperatures in compute centers or in bottlenecks in the power supply. In the context of composed OTF services, we also deal with scheduling under precedence constraints to compute services more efficiently.
The aforementioned scheduling problems are sufficiently abstract to evaluate them theoretically by competitive analysis. In the following problems, the demands of real compute centers are considered in more detail. Hence, they will be investigated with experimental methods (simulation or emulation) where the focus lies on the network as bottleneck. Efficiently coping with overloaded networks is the actual problem in practice. Hence, we will place software components such that the load at the bottlenecks is reduced. For this kind of scheduling, we will use the information given from the oracle about application characteristics as well as meta data contained in an OTF service. We will evaluate how we can use this information at runtime to optimize the operation of the compute center.
Finally, we plan to build a prototype for a single OTF Compute Center, incorporating the architecture and scheduling algorithms described above. A simulation framework supports our theoretical evaluations by experimental results.
Multiple OTF Compute Centers
In this area, we deal with the execution of OTF services in multiple OTF Compute Centers. Typically, OTF services are executed on OTF Compute Centers with an advantageous position with respect to inputs and outputs. Accordingly, questions arise where to execute which part of an OTF service. This model assumes cooperating OTF compute centers, e.g. centers that belong to the same OTF provider. In the context of competing compute centers, we will cooperate with subproject A3 by using game-theoretical approaches as they have been considered in the area of selfish scheduling.
Similarly to the prior working area, we will propose an architecture where we can assign requests to a suitable compute center while incorporating workload and network topology. A special focus lies on the repeated execution of similar requests. We are confident that we can build upon our preliminary studies on Distributed Cloud Computing and Network Function Virtualization.
Regarding scheduling in multiple OTF Compute Centers, we will focus on the assignment process of jobs to compute centers, i.e. given a number of jobs arriving at some inputs and being requested at some outputs, we are looking for compute centers such that communication and execution costs are minimized.
For competing compute centers, on the other hand, we examine game theoretic models with a given number of players, each operating some compute centers. They can make offers to different jobs; jobs are executed on the compute center with the best offer.
Additionally, we will develop a prototype and evaluate the results by using tools such as MaxiNet engineered by us so far.
OTF Compute Center & Cloud
Motivated by resource leasing in the cloud, we want to evaluate scenarios where resources are exclusively or additionally rented from resource providers. This yields an additional scheduling decision, namely the scheduler does not only need to decide which jobs to execute when and where, but also which resources to lease and for how long. Leasing dates back to the so called Parking Permit Problem introduced by Meyerson in 2001, which we later extended.
We are interested in leasing problems where leased units have limited capacity, including heterogeneous models where units of different capacity and different cost may be leased. In our theoretical models, heterogenous machines can be rented for different times for a certain cost. The scheduler aims to minimize the overall induced cost.