The goal of this effort is to validate the theoretical model of OS Jitter with data collected from large production clusters. Details revealed through empirical study helps in fine-tuning the model. This allows us to establish a methodology for predicting the performance of large clusters. As part of this effort, we have designed a parallel benchmark that measures the OS Jitter distribution in a cluster. The parallel benchmark kernel is shown in the following figure.

We ran the parallel benchmark on large production cluster at the SanDiego Supercomputing Center (SDSC) (8-way 128 nodes - a total of 1024 processors) and use this data to make performance predictions made by the theoretical model proposed earlier. This validation step enables us to predict the performance of large clusters. We report the prediction accuracy against measurements. This is shown in the figure below as a distribution of max [1...N](work(i)+jitter(i)) against distributions of (t(e)(i) −t(s)(i)) of the parallel benchmark for a work quanta of 13ms, where i represents the i-th node. The predicted average shows a deviation from the measured time for an iteration near the tail of the distribution.

We also discovered that measurements of jitter distributions also help in identification of misbehaving nodes or processors. This is shown below in the figure below where the spike in the 99-th percentile value from distributions of (work(i)+jitter(i) ) indicates presence of a anomalous node.

In addition to making performance predictions, our study could be useful in performance improvements. Traditional techniques for performance improvement either fall in the category of jitter reduction or jitter synchronization. Jitter reduction is achieved
by removing several system daemons, dedicating a spare processor to absorb jitter, and reducing the frequency of daemons. Jitter synchronization is achieved by explicit coscheduling or gang scheduling [1–3]. Most of these implementations require changing the scheduling policies. Our work gives insight into another technique for improving performance, that can be called jitter smoothing. If the model predicts the actual performance reasonably well, then the systems can be tuned to ensure that the jitter does
not follow a heavy tail distribution (i.e. does not have infrequent interruptions that take long time). This technique may complement the other approaches currently used in large high-performance systems.
More details about this work can be found in our paper at HiPC 2006.
References
[1]. J. Moreira, H. Franke, W. Chan, L. Fong, M. Jette, and A. Yoo, “A Gang-Scheduling System for ASCI Blue-Pacific,” in International Conference on High performance Computing and Networking, 1999.
[2]. A. Hori and H. Tezuka and Y. Ishikawa, “Highly Efficient Gang Scheduling Implementations,” in ACM/IEEE Conference on Supercomputing, 1998.
[3]. E. Frachtenberg, F. Petrini, J. Fernandez, S. Pakin, and S. Coll, “STORM: Lightning-Fast Resource Management,” in ACM/IEEE Conference on Supercomputing, 2002.
