Cellular disco: resource management using virtual clusters on shared-memory multiprocessors

Citation
K. Govil et al., Cellular disco: resource management using virtual clusters on shared-memory multiprocessors, ACM T COMP, 18(3), 2000, pp. 229-262
Citations number
30
Language
INGLESE
art.tipo
Article
Categorie Soggetti
Computer Science & Engineering
Journal title
ACM TRANSACTIONS ON COMPUTER SYSTEMS
ISSN journal
0734-2071 → ACNP
Volume
18
Issue
3
Year of publication
2000
Pages
229 - 262
Database
ISI
SICI code
0734-2071(200008)18:3<229:CDRMUV>2.0.ZU;2-A
Abstract
Despite the fact that large-scale shared-memory multiprocessors have been c ommercially available for several years, system software that fully utilize s all their features is still not available, mostly due to the complexity a nd cost of making the required changes to the operating system. A recently proposed approach, called Disco, substantially reduces this development cos t by using a virtual machine monitor that leverages the existing operating system technology. In this paper we present a system called Cellular Disco that extends the Disco work to provide all the advantages of the hardware p artitioning and scalable operating system approaches. We argue that Cellula r Disco can achieve these benefits at only a small fraction of the developm ent cost of modifying the operating system. Cellular Disco effectively turn s a large-scale shared-memory multiprocessor into a virtual cluster that su pports fault containment and heterogeneity, while avoiding operating system scalability bottlenecks. Yet at the same time, Cellular Disco preserves th e benefits of a shared-memory multiprocessor by implementing dynamic, fine- grained resource sharing, and by allowing users to overcommit resources suc h as processors and memory. This hybrid approach requires a scalable resour ce manager that makes local decisions with limited information while still providing good global performance and fault containment. In this paper we d escribe our experience with a Cellular Disco prototype on a 32-processor SG I Origin 2000 system. We show that the execution time penalty for this appr oach is low, typically within 10% of the best available commercial operatin g system for most workloads, and that it can manage the CPU and memory reso urces of the machine significantly better than the hardware partitioning ap proach.