The goal of this effort is to initiate the study of the impact of OS jitter on the scaling of parallel applications in a formal manner. We focus on a particularly important class of parallel applications which often arise in scientific computations. Here, typically, each node in the cluster is repetitively involved in a computation stage, followed by a collective operation; such as a barrier computation. We model this theoretically and demonstrate the effect of OS jitter (or noise) on the performance of such parallel applications. We study three natural and important classes of noise distributions:
- The exponential distribution,
- The heavy-tailed distribution, and
- The Bernoulli distribution
We show that the systems scale well in the presence of an exponential noise, but their performance goes down drastically in the presence of a heavy-tailed or a Bernoulli noise. Though our model is very simple, it is powerful enough to predict the effect of noise on scaling. We believe that this study will also be extremely useful in identifying and improving bottlenecks in the scalability of systems in a more systematic way, for instance, by designing scheduling policies, which take into account the nature of the noise, to improve the overall system performance. To the best of our knowledge, this is the first attempt to explain the impact of noise
with a mathematical model.
For more details, please refer our paper at HiPC 2005.
