Fossilization: Compliant reference storage solutions

Innovation Matters


The fundamental purpose of record-keeping is to establish irrefutable proof and accurate details of events that have occurred. However, critical records, such as business communications, financial statements and medical images, are increasingly stored in electronic form, which makes them relatively easy to clandestinely destroy and modify. The threat of intentional and inside attacks is very real, given the extremely high stakes that could be involved in tampering with the records. With recent corporate misconduct and the ensuing attempts to change history, a growing fraction of records is now subject to regulations (e.g. Sarbanes-Oxley Act, SEC Rule 17a-3/4, HIPPA, DOD 5015.2) on how they should be maintained. A 2003 study by the Enterprise Storage Group predicted that the worldwide volume of compliant records will increase by 64 percent per year to almost 2 petabytes in 2006.

overview architecture
Architecture Overview

The current industry practice and regulatory requirements (e.g. SEC Rule 17a-4) rely on storing records in Write Once Read Many (WORM) storage to preserve them. But this is increasingly inadequate to ensure that the records are trustworthy, i.e. able to provide solid proof and accurate details of past events. For example, with the large volume of records and short query response time typical today, the records have to be indexed, but traditional indexing methods allow records, even those stored in WORM storage, to be effectively (logically) altered and deleted. Moreover, many records have long retention periods, requiring them to be periodically migrated to new storage systems, which makes them vulnerable. The records may even be susceptible to alteration during transit to the agent conducting an enquiry.


In this project, IBM Research takes a fresh holistic approach to electronic record-keeping. We have developed a process called fossilization to ensure that records are trustworthy from an end-to-end perspective - where records are kept to where records are received (such as by an agent performing an audit, a legal or regulatory discovery or an internal investigation). Fossilization is composed of three parts: • Fossilization of storage guarantees that all of the records and their associated metadata are not only reliably stored for an extended period of time, but are also securely protected from any modification. • Fossilization of discovery ensures that every preserved record that is relevant to an inquiry can be readily located and retrieved in a timely fashion. • Fossilization of delivery warrants that exactly the retrieved records are delivered to the agent and that they are delivered verbatim.

A key challenge in realizing this vision is that various aspects of today's systems are incompatible with the goal of preserving trust throughout a record's lifetime. Therefore, researchers have developed a comprehensive portfolio of trust-centric technologies cutting across traditional disciplines, such as database, operating system, computer architecture, security and packaging. These include Content Immutable Storage (CIS) to securely protect data from being overwritten, fossilized indices to prevent logical modification of records, trust-preserving migration to guard against record alteration during migration and unified auditing to detect potential inconsistencies across the solution stack. Since records typically contain sensitive information, researchers have also devised ways to preserve the privacy of the information throughout the records' lifetime. Moreover, if records are available, they are subject to discovery, and typically at great expense to the owning organization. Thus, IBM Research also created techniques to effectively dispose of records that are no longer useful so that they cannot be recovered, even with the use of data forensics.

For example, there has been a lot of work on indexing techniques, including several focused on indexing methods for WORM storage, but the previous techniques were not designed for trustworthy record keeping. Specifically, they require dynamic adjustments to the index structure which makes them vulnerable to logical modification of records. In general, any approach that requires the rebalancing of a tree is thus insecure because it allows an adversary to create new paths to records. Trees that grow in a bottom-up fashion are similarly exposed because an adversary could modify records at will by exploiting the provision for creating new versions of tree nodes. In addition, any method that permits index entries to be relocated is inherently not trustworthy because it opens the door for an adversary to create new versions of any entry. In this project, researchers have developed fossilized indices that are scalable and effective without requiring dynamic adjustments to their structure.

A further challenge in this project is to create a record-keeping solution that delivers end-to-end trust in a practical manner by using standard interfaces and building upon existing infrastructure. Researchers have developed a working prototype to demonstrate the feasibility of such a solution.

Related Publications  

Timothy Denehy and Windsor Hsu. Duplicate Management for Reference Data. . IBM Research Report RJ 10305, October 2003.

Windsor Hsu, Lan Huang and Shauchi Ong. Content Immutable Storage: Truly Trustworthy and Cost-Effective Storage for Electronic Records. . IBM Research Report RJ 10332, October 2004.

Windsor Hsu and Shauchi Ong. Fossilization: A Process for Establishing Truly Trustworthy Records. . IBM Research Report RJ 10331, October 2004.


Rate this article

Innovator's corner  

Windsor HsuWindsor Hsu Researcher

What is the most exciting potential future use for the work you're doing?
The solutions we are working on can fundamentally change the way all records are kept. They provide the necessary safeguards to ensure that records remain trustworthy while allowing them to be stored, managed, searched, analyzed and processed conveniently, quickly and cost effectively. This means that we can both keep an accurate account of history and actually put the data collected to good use. Organizations will be able to gain effective control of their records and use them as a vital asset to derive new value.


What is the most interesting part of your research?
Talking to clients, analyzing competitors, understanding our strengths, and thinking hard to come up with effective ways to address client needs and leapfrog the competition. In other words, looking at the real world, figuring out the worthwhile battles, mapping them into technical challenges, solving the challenges, and creating the solution. In the process, we have a lot of fun and at the end of the day, we all feel good because we know we made a real difference. In this particular case, we do not just follow the crowd to provide basic WORM storage. Instead, we leverage our technical breadth and take a holistic approach to enable truly trustworthy electronic record keeping.


What inspired you to go into this field?
This is where the bits are going to be. Computer storage devices have improved so dramatically over the years and we have become so information-driven that it makes sense to keep records of everything electronically. Several studies show that the volume of fixed-content data is growing very rapidly and will soon exceed that of the traditional transactional data.


What is your favorite invention of all time?
Sticky notes. It achieves its function with such elegant simplicity. I once saw somebody carry a fancy PDA that is papered over with sticky notes.

Research team  

Ying Chen

Ying Chen

Wayne Hineman

Wayne Hineman

Xiaonan Ma

Xiaonan Ma

Shauchi Ong

Shauchi Ong

Related Research  

Disciplines: Computer Science
Research Areas: Storage Systems

Content navigation