Impact of OS Jitter ("Noise") on Performance of Collectives, Technical Staff Member, IBM India Research Lab, New Delhi, April 2006 - till present
- This project involves studying the impact of operating system interrupts and system daemons (referred to as "OS Jitter") on the scaling of tightly coupled parallel applications (referred to as "Collectives") running on very large clusters.
- In order to study the impact of noise, this project makes use of micro benchmarks that collect huge time series performance data and create frequency distributions (histograms) out of that data. Currently two benchmarks are being used – a) a parallel benchmark that represents a typical tightly coupled parallel application that does some work followed by a “Collective” call (a barrier/gather/scatter) in each phase; b) a single node benchmark that samples the clock cycle register on the CPU in a tight loop and then calculates the difference between successive readings in order to estimate the jitter experienced by the benchmark.
- My work has primarily focused on a methodology for identifying sources of OS Jitter on a given system by running the single node benchmark and instrumenting process scheduling and interrupt handling in linux 2.6 kernel. The benchmark data and kernel instrumentation data is then analyzed and compared to identify sources of OS Jitter and measuring their impact. This work has been accepted for publication and presentation at IEEE Cluster 2007, to be held in Austin, Texas in September, 2007
- Also responsible for porting (from C to C++), redesigning and implementing the data collector tool with new interfaces for time series data, workloads and frequency distributions which allows this tool to be more flexible and reusable.
WebSphere XD (eXtended Deployment) Health Monitoring
Technical Staff Member, IBM India Research Lab, New Delhi, January 2005 - December 2006.
- This project aimed at improving problem determination and health monitoring in the latest WebSphere Application Server offering called the WebSphere XD. I am part of a team that spans the Watson Research Lab in the US and the India Research Lab. Key Technologies: Core Java, Websphere Application Server, DB2
- Led the effort for the design, implementation and delivery of a custom health framework for Websphere eXtended Deploment (XD). This framework allows an administrator to specify any health problem as a boolean combination of static threshold violations and/or abrupt changes in the value of various runtime metrics like response time, throughput, cpu utilization, memory utilization, number of requests queued, etc.
- Developed sensors and the algorithm to detect a condition known as “Storm Drain” that results in the dynamic workload scheduler in a clustered environment scheduling the majority of requests to an application server that is executing requests much faster than other servers due to an application fault. Co-inventor of the patent filed on this system.
- Extended the health monitoring framework to work with non WAS servers.
Symphony: Decentralized Orchestration of Composite Web Services
Technical Staff Member, IBM India Research Lab, New Delhi, June 2003 - December 2004.
- This project aims at improving the performance of workflow based composite web services by decomposing the workflow into partitions that can be run in a decentralized setup on different hosts, thereby achieving gain in performance due to increased concurrency and reduced network traffic.
- Work involves implementing a Java based prototype that partitions an input BPEL4WS specification and deploys them onto a workflow engine.
- Also responsible for development of appropriate runtime infrastructure for status monitoring and fault handling in a decentralized setup. This involved using the WebSphere MBean APIs and the Process Choreographer Java APIs for remote administration. Designed and implemented a generic fault handling and recovery framework for cooperative workflows that communicate with each other using asynchronous messaging. Co-inventor of the patent filed on this framework.
BasketLink TM - Portfolio Analytics and Reporting
IT Associate, Morgan Stanley, New York, June 2002 - May 2003.
- This application had an Excel front end written in Visual Basic and a set of complex back end C++ server processes communicating with the front end over CORBA and SOAP. The front end presented various statistics and reports on a portfolio and the back end processes did the various calculations. The back end processes also handled historical pricing data storage/retrieval with a Sybase database and with memory-mapped files for faster access.
- Reengineered and ported the backend C++ server process from Solaris to Linux. This included taking care of various issues arising due to differences in the threading model, signal handling, and byte ordering on Linux. Also worked on implementing new feature enhancements and reports.
- Ported a critical communications library used by all the server processes from using Unix sockets API to higher-level Morgan Stanley Internal Infrastructure libraries for network communication. This involved a thorough understanding of complex code written more than 6 years ago and a comprehensive redesign of the threading model
Hoteling Seats Reservation System
Distributed Systems Graduate Training Program, Morgan Stanley, New York, February 2002 - June 2002.
- This project was done at the end of the 4 month long intensive Distributed Systems training program covering Unix, Advanced C++, OOAD, Perl, SOAP, XML and basic financial concepts. This was a live project for managing the reservation of "hoteling seats" (temporary desks for roaming employees) at different locations of Morgan Stanley and the entire software development cycle had to be completed in one month. Designed and implemented a back end SOAP server in C++. This server communicated with a servlet/jsp based web server over SOAP and also with a Sybase database.
DISCOVER - Distributed Interactive Steering and Collaborative Visualization EnviRonment
Research Assistant, Rutgers University, New Jersey, Sept 99 - July 2001
- A web portal for steering and collaboratively interacting with high performance scientific applications running on a backend cluster. Developed the interaction and collaboration server using Java servlets and the client-server interaction protocol for collaboration among clients.
- Enhanced the server architecture for multiple DISCOVER servers using CORBA. This formed the core of my Masters Thesis - Middleware Architecture for Integrated Computational Collaboratories (July 2001). It investigated the requirements for achieving interoperability among multiple collaboratories. Designed, implemented and experimentally evaluated a prototype middleware substrate that enabled interoperability among multiple,geographically distributed instances of the DISCOVER computational collaboratory and provided seamless access to remote servers through CORBA.
Dynamic content request scheduling algorithm and cooperative caching of dynamic web pages.
Research Assistant, Rutgers University, New Jersey, February 2000 - May 2000
- Caches results of CGI execution in memory, based on the cacheability settings of that CGI script as specified by the web server administrator.The distributed cluster based web server uses cached results when there are subsequent requests for the same script. It takes advantage of a fast interconnect between different nodes to access a file cached in a remote node’s memory thereby avoiding disk access latencies.The caching information for each node is broadcast to all other nodes and this information is used to enhance the scheduling alogorihtm for dynamic content requests. The scheduling algorithm used static profiling of scripts based upon resource requirements (CPU, IO, Lightweight) to schedule requests on to different nodes of a cluster according to the load information on each node for each type of script.
Binary rewriting of Java classes for secure access to untrusted code.
Research Assistant, Rutgers University, New Jersey, September 2000 - January 2001
- Automatic partitioning might be necessary for exisiting applications that run on public computers (e.g. kiosks) and handle confidential personal data (e.g. credit card numbers, passwords, etc), such that the portions of the application which handle such confidential data, run on a trusted device (e.g a PDA) and remotely access the code running on an untrusted device (i.e. the kiosk). This project explored binary rewriting of Java classes at runtime to make an untrusted class remotely accessible through RMI. Used the freely available Java bytecode engineering tool "JOIE" to rewrite a set of Java classes in a way that they can be accessed remotely through RMI.
Documentation about these and some other projects that I did during my Masters at Rutgers can be obtained from my Rutgers page.
