The fundamental purpose of record-keeping is to establish irrefutable proof and accurate details of events that have occurred. However, critical records, such as business communications, financial statements and medical images, are increasingly stored in electronic form, which makes them relatively easy to clandestinely destroy and modify. The threat of intentional and inside attacks is very real, given the extremely high stakes that could be involved in tampering with the records. With recent corporate misconduct and the ensuing attempts to change history, a growing fraction of records is now subject to regulations (e.g. Sarbanes-Oxley Act, SEC Rule 17a-3/4, HIPPA, DOD 5015.2) on how they should be maintained. A 2003 study by the Enterprise Storage Group predicted that the worldwide volume of compliant records will increase by 64 percent per year to almost 2 petabytes in 2006.
Architecture Overview
The current industry practice and regulatory requirements (e.g. SEC Rule 17a-4) rely on storing records in Write Once Read Many (WORM) storage to preserve them. But this is increasingly inadequate to ensure that the records are trustworthy, i.e. able to provide solid proof and accurate details of past events. For example, with the large volume of records and short query response time typical today, the records have to be indexed, but traditional indexing methods allow records, even those stored in WORM storage, to be effectively (logically) altered and deleted. Moreover, many records have long retention periods, requiring them to be periodically migrated to new storage systems, which makes them vulnerable. The records may even be susceptible to alteration during transit to the agent conducting an enquiry.
In this project, IBM Research takes a fresh holistic approach to electronic record-keeping. We have developed a process called fossilization to ensure that records are trustworthy from an end-to-end perspective - where records are kept to where records are received (such as by an agent performing an audit, a legal or regulatory discovery or an internal investigation). Fossilization is composed of three parts: • Fossilization of storage guarantees that all of the records and their associated metadata are not only reliably stored for an extended period of time, but are also securely protected from any modification. • Fossilization of discovery ensures that every preserved record that is relevant to an inquiry can be readily located and retrieved in a timely fashion. • Fossilization of delivery warrants that exactly the retrieved records are delivered to the agent and that they are delivered verbatim.
A key challenge in realizing this vision is that various aspects of today's systems are incompatible with the goal of preserving trust throughout a record's lifetime. Therefore, researchers have developed a comprehensive portfolio of trust-centric technologies cutting across traditional disciplines, such as database, operating system, computer architecture, security and packaging. These include Content Immutable Storage (CIS) to securely protect data from being overwritten, fossilized indices to prevent logical modification of records, trust-preserving migration to guard against record alteration during migration and unified auditing to detect potential inconsistencies across the solution stack. Since records typically contain sensitive information, researchers have also devised ways to preserve the privacy of the information throughout the records' lifetime. Moreover, if records are available, they are subject to discovery, and typically at great expense to the owning organization. Thus, IBM Research also created techniques to effectively dispose of records that are no longer useful so that they cannot be recovered, even with the use of data forensics.
For example, there has been a lot of work on indexing techniques, including several focused on indexing methods for WORM storage, but the previous techniques were not designed for trustworthy record keeping. Specifically, they require dynamic adjustments to the index structure which makes them vulnerable to logical modification of records. In general, any approach that requires the rebalancing of a tree is thus insecure because it allows an adversary to create new paths to records. Trees that grow in a bottom-up fashion are similarly exposed because an adversary could modify records at will by exploiting the provision for creating new versions of tree nodes. In addition, any method that permits index entries to be relocated is inherently not trustworthy because it opens the door for an adversary to create new versions of any entry. In this project, researchers have developed fossilized indices that are scalable and effective without requiring dynamic adjustments to their structure.
A further challenge in this project is to create a record-keeping solution that delivers end-to-end trust in a practical manner by using standard interfaces and building upon existing infrastructure. Researchers have developed a working prototype to demonstrate the feasibility of such a solution.
Related Publications
Timothy Denehy and Windsor Hsu. Duplicate Management for Reference Data. . IBM Research Report RJ 10305, October 2003.
Windsor Hsu, Lan Huang and Shauchi Ong. Content Immutable Storage: Truly Trustworthy and Cost-Effective Storage for Electronic Records. . IBM Research Report RJ 10332, October 2004.
Windsor Hsu and Shauchi Ong. Fossilization: A Process for Establishing Truly Trustworthy Records. . IBM Research Report RJ 10331, October 2004.







