Despite the fact that large-scale shared-memory multiprocessors have been c
ommercially available for several years, system software that fully utilize
s all their features is still not available, mostly due to the complexity a
nd cost of making the required changes to the operating system. A recently
proposed approach, called Disco, substantially reduces this development cos
t by using a virtual machine monitor that leverages the existing operating
system technology. In this paper we present a system called Cellular Disco
that extends the Disco work to provide all the advantages of the hardware p
artitioning and scalable operating system approaches. We argue that Cellula
r Disco can achieve these benefits at only a small fraction of the developm
ent cost of modifying the operating system. Cellular Disco effectively turn
s a large-scale shared-memory multiprocessor into a virtual cluster that su
pports fault containment and heterogeneity, while avoiding operating system
scalability bottlenecks. Yet at the same time, Cellular Disco preserves th
e benefits of a shared-memory multiprocessor by implementing dynamic, fine-
grained resource sharing, and by allowing users to overcommit resources suc
h as processors and memory. This hybrid approach requires a scalable resour
ce manager that makes local decisions with limited information while still
providing good global performance and fault containment. In this paper we d
escribe our experience with a Cellular Disco prototype on a 32-processor SG
I Origin 2000 system. We show that the execution time penalty for this appr
oach is low, typically within 10% of the best available commercial operatin
g system for most workloads, and that it can manage the CPU and memory reso
urces of the machine significantly better than the hardware partitioning ap
proach.