Preservation DataStores -- a project of IBM Haifa -- is an infrastructure component of CASPAR, a European Union project to preserve scientific, artistic and cultural artifacts. IBM researchers are building PDS to address the escalating need for specialized storage for long-term digital preservation.
The volume of digital information keeps growing. Indeed, most information today is "born digital." Alongside this trend, business, scientific, artistic and cultural needs demand that this information be stored and interpreted permanently. The convergence of these two trends has resulted in a need for storage systems that support very long-term preservation of the bits and interpretations of the objects represented by those bits.
That's why IBM Haifa built Preservation DataStores (PDS), a novel storage component that supports digital preservation environments. PDS will serve as an infrastructure component of CASPAR, a European Union project that is building a framework to support the end-to-end preservation "lifecycle" for scientific, artistic and cultural information.
Can we avert a digital Dark Ages?
We are facing a paradox. We can read and interpret the Dead Sea scrolls written almost 2000 years ago, but we cannot do the same with data generated 20 years ago on a 5.25 inch floppy disk. Ironically, as the world becomes digital, we may be entering a digital "Dark Ages" in which business, public and personal assets are in ever greater danger of being lost. We are capable of storing digital bits and can interpret them as long as the environment doesn't change. But what current technologies can ensure storage and interpretation over long periods of time?
Effective storage technologies will require built-in support for preservation, and they will facilitate preservation environments that are more robust -- with lower probability for data corruption or loss. And they will have to preserve the understandability and usability of complex interrelated objects even as technologies for computer hardware, operating systems, data management products and applications are replaced with newer ones -- and as data consumers (designated communities) change frequently.
Standardizing digital preservation systems
A core standard for digital preservation systems is the Open Archival Information System (OAIS), an ISO standard since 2003 (ISO 14721:2003 OAIS) that specifies the terms, concepts and reference models for a system dedicated to preserving digital assets for a designated community.
The main OAIS concept relevant to storage is the Archival Information Package (AIP), depicted in Figure 1. An AIP contains the content information that includes the Content Data Object, or raw data, which is the focus of the preservation, as well as the recursive Representation Information needed to render the object intelligible to its designated community. The representation information may include information about the hardware and software environment needed to view and interpret the content data object.
OAIS AIP logical structure
The other part of an AIP is the Preservation Description Information (PDI), which is further divided into four sections:
• Reference (globally unique and persistent identifiers for the content data object).
• Provenance (chain of custody, the history and the origin of the content information custody).
• Context (relationships of the content information to its environment).
• Fixity (a demonstration that the particular content information has not been altered in an undocumented manner).
Ensuring data usability and integrity over long periods of time
PDS is a novel storage architecture that has built-in support for long-term digital preservation based on OAIS. In contrast with traditional block or file storage, and even with traditional archival systems, PDS “materializes” the logical concept of a preservation information object, namely the AIP, into a physical storage object.
PDS encapsulates the raw data with its complex interrelated metadata objects, so they are inseparable during the migration process and future data access. It also reduces the volume of data transfers by offloading data-intensive functions, such as fixity computations, from applications to the storage environment. PDS also simplifies applications by transferring management responsibilities of storage-related events, such as provenance, to the storage environment itself.
Finally, PDS handles migration internally, including the execution of externally-specified logical transformations. PDS is composed of a layered architecture based on open standards, along with the OAIS, XAM and OSD standards and complies with the general design principle of preservation systems that employ open standards wherever possible (see Figure 2). A complementary services tool for assessing the maturity of digital repositories for long-term preservation is also under development.
Preservation DataStores Architecture
Experts on this topic:
Simona Cohen: SIMONA@il.ibm.com
Dalit Naor: DALIT@il.ibm.com
Michael Factor: FACTOR@il.ibm.com
Related Publications
Simona Cohen, Dalit Naor, Leeat Ramati and Petra Reshef. Towards OAIS-Based Preservation Aware Storage - A White Paper. IBM Haifa. IBM Haifa Research Lab, November 2006.
Michael Factor, Dalit Naor, Simona Rabinovici-Cohen, Leeat Ramati, Petra Reshef, Julian Satran and Giaretta. Preservation DataStores: Architecture for Preservation Aware Storage. Proceedings of the IEEE Conference on Mass Storage Systems and Technologies (MSST). IEEE, September 2007.
Michael Factor, Dalit Naor, Simona Rabinovici-Cohen, Leeat Ramati, Petra Reshef and Julian Satran. The Need for Preservation Aware Storage - A Position Paper. ACM SIGOPS Operating Systems Review, Special Issue on File and Storage Systems 41(1), January 2007.
See also: Storage systems and performance management
Last updated September 19, 2007









