Traditionally, large scale HPC systems have avoided jitter by making use of a specialized light weight operating system on compute nodes [2] [3]. However, this limits the use of such HPC systems as most applications, which are written for commercial operating systems cannot be run on these systems. This has resulted in efforts to create light weight versions of commodity operating systems such as Linux which can be used on compute nodes of large scale HPC systems [4] [5] [6].
Creation of light weight version of commodity operating system necessitates that a detailed study identifying the sources of OS jitter and a quantitative measurement of their impact on these operating systems be carried out. Apart from the well known ill effects of operating sytstem clock ticks or timer interrupts [1], there is little data available about other system daemons and interrupts that contribute to OS jitter. Furthermore, tuning an out of the box commodity operating system is the first step towards mitigating the effects of OS jitter. In the absence of any quantitative information about the jitter caused by various system daemons and interrupts, system administrators resort to their established knowledge and other ad-hoc methods for tuning a system for HPC applications. This process not only requires highly knowledgeable system administrators, but is also error prone given the fact that new versions of these commodity operating systems get released at fairly regular intervals and new sources of OS jitter get introduced in these releases.
Identification of all possible sources of OS jitter and measurement of their impact on an application requires a detailed trace of the OS activity. Most of the existing general purpose OS profiling tools, such as OProfile [7] or the Linux kernel scheduler stats [8], provide a coarse measure in terms of time spent in each kernel function or process and do not uniquely measure the jitter perceived by an application due to each jitter source. Other benchmarks developed specifically for studying OS jitter such as the selfish-detour benchmark [9] can be used to measure OS jitter on a wide range of platforms and study its effect on parallel program performance. However, they do not provide any information about what daemons and interrupts contribute to OS jitter and by how much.
We have designed and implemented a tool that helps in identifying sources of OS jitter on a commodity operating system such as Linux and can be used to quantitatively measure the jitter contributed by various system daemons and interrupts. The tool combines the techniques employed by micro-benchmarks based on reading the CPU timestamp counter used for studying OS jitter [9] with profiling techniques used by kernel profiling tools such as OProfile [7]. Our methodology comprises of running a user-level micro-benchmark and measuring the latencies experienced by the benchmark. We then associate each latency to operating system daemons and interrupts using data obtained from kernel instrumentation.
Our results reveal that while 63% of the total jitter comes from timer interrupts (on an out of the box Fedora Core 5 Linux system running in run level 3), the rest comes from various system daemons and interrupts, most of which can be easily eliminated. This is shown in the table below.

We validated our methodology through introduction of synthetic daemons and their reliable detection, which illustrates how the tool can be used to detect new sources of OS jitter that get introduced as software get installed and upgraded on a tuned system over a period of time. This is shown in the figures below, where one synthetic daemon - dummydaemon_1 was introduced in such a way that it wakes up once every 10 seconds and take approximately 2300 us to finish.
![]() |
| Comparing the microbenchmark results for default configuration and the one with a synthetic daemon - x-axis is the jitter perceived by the benchmark and y-axis is a logarithmic function of the number of sample points |
![]() |
| Zooming into the 2000-2500 us range to detect the jitter sources - All jitter sources that occur in succession are concatenated by a "_" |
This work has been accepted for publication and presentation at the IEEE Cluster 2007 to be held in Austin, Texas in September 2007.
References
[1] D. Tsafrir, Y. Etsion, D. G. Feitelson, and S. Kirkpatrick, “System Noise, OS Clock Ticks, and Fine-grained Parallel Applications,” in ICS, 2005.
[2] S. M. Kelly and R. Brightwell, “Software architecture of the light weight kernel, catamount,” in 47th Cray User Group Conference, May 2005.
[3] R. Brighwell, R. Reisen, K. Underwood, T. B. Hudson, P. Bridges, and A. B. Maccabe, “IEEE International Conference on Cluster Computing,”
in A Performance Comparison of Linux and a Lightweight Kernel, 2003.
[4] “Right-weight Linux Kernel Project at Los Alamos National Laboratory.” [Online]. Available: http://public.lanl.gov/cluster/projects/index.html
[5] L. S. Kaplan, “Lightweight Linux for High-Performance Computing,” in LinuxWorld.com, December 2006. [Online]. Available:
http://www.linuxworld.com/news/2006/120406-lightweight-linux.html
[6] “Zeptoos: The small linux for big computers.” [Online]. Available: http://www-unix.mcs.anl.gov/zeptoos/
[7] “OProfile: A System Proler for Linux.” [Online]. Available: http://sourceforge.net/projects/oprofile/
[8] “Linux kernel scheduler statistics.” [Online]. Available: http://www.mjmwired.net/kernel/Documentation/sched-stats.txt
[9] “Selsh detour benchmark suite.” [Online]. Available: http://www-unix.mcs.anl.gov/zeptoos/software/index.php


