Infrastructure Management Services

Problem Determination and Localization in Middleware Systems
Enterprise middleware systems typically consist of a large cluster of machines with stringent performance requirements. Hence when a performance problem occurs in such environments, it is critical that the health monitoring software identifies the root cause of the problem with minimal delay. Furthermore, clustered enterprise middleware systems employing dynamic workload scheduling are susceptible to a variety of application malfunctions that can manifest themselves in a counterintuitive fashion and cause debilitating damage.

Until now, diagnosing problems in that domain involves investigating log files and configuration settings and requires in-depth knowledge of the middleware architecture and application design.
Using Change Point based Problem Signatures
Middleware systems management group at IRL works on various system management and health monitoring aspects of enterprise middleware systems. Our research focus is on problem determination using change point detection techniques and problem signatures consisting of a combination of changes (or absence of changes) in different metrics. We have implemented this approach on a clustered middleware system and applied it to the detection of the storm drain condition: a debilitating problem encountered in clustered systems with counterintuitive symptoms. This is now a part of a released product. This work was published at the 10th IEEE/IFIP Network Operations and Management Symposium (NOMS 2006), Vancouver, Canada, April 2006.

In an extension to the above work, we have worked on designing and implementing a generic framework that allows an administrator to specify any health problem as a boolean combination of static threshold violations and/or abrupt changes in the value of various runtime metrics like response time, throughput, cpu utilization, memory utilization, number of requests queued, etc.

More details about our work on change point based problem signatuares can be found on this page.

Learning Problems Signatures
However, such predefined signatures still require some domain expertese for their definition. We have also been working on a methodology that learns problem signatures with administrator feedback, thereby removing the dependence on domain experts for creating correct problem signatures, and also allowing for different signatures. Additionally, the problem signatures generated by our method are flexible, do not require exact matches for triggering, and evolve as more information becomes available. This work was published at the 17th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management (DSOM) held in Dublin, Ireland, October 2006 and was awarded with a best paper award.

More details about this work can be found on this page.
Read More



Last updated 5 Feb 2009