Preservation DataStores: Storage assist for preservation environments

Innovation Matters


Preservation DataStores -- a project of IBM Haifa -- is an infrastructure component of CASPAR, a European Union project to preserve scientific, artistic and cultural artifacts. IBM researchers are building PDS to address the escalating need for specialized storage for long-term digital preservation.

The volume of digital information keeps growing. Indeed, most information today is "born digital." Alongside this trend, business, scientific, artistic and cultural needs demand that this information be stored and interpreted permanently. The convergence of these two trends has resulted in a need for storage systems that support very long-term preservation of the bits and interpretations of the objects represented by those bits.

That's why IBM Haifa built Preservation DataStores (PDS), a novel storage component that supports digital preservation environments. PDS will serve as an infrastructure component of CASPAR, a European Union project that is building a framework to support the end-to-end preservation "lifecycle" for scientific, artistic and cultural information.

Can we avert a digital Dark Ages?
We are facing a paradox. We can read and interpret the Dead Sea scrolls written almost 2000 years ago, but we cannot do the same with data generated 20 years ago on a 5.25 inch floppy disk. Ironically, as the world becomes digital, we may be entering a digital "Dark Ages" in which business, public and personal assets are in ever greater danger of being lost. We are capable of storing digital bits and can interpret them as long as the environment doesn't change. But what current technologies can ensure storage and interpretation over long periods of time?

Effective storage technologies will require built-in support for preservation, and they will facilitate preservation environments that are more robust -- with lower probability for data corruption or loss. And they will have to preserve the understandability and usability of complex interrelated objects even as technologies for computer hardware, operating systems, data management products and applications are replaced with newer ones -- and as data consumers (designated communities) change frequently.

Standardizing digital preservation systems
A core standard for digital preservation systems is the Open Archival Information System (OAIS), an ISO standard since 2003 (ISO 14721:2003 OAIS) that specifies the terms, concepts and reference models for a system dedicated to preserving digital assets for a designated community.

The main OAIS concept relevant to storage is the Archival Information Package (AIP), depicted in Figure 1. An AIP contains the content information that includes the Content Data Object, or raw data, which is the focus of the preservation, as well as the recursive Representation Information needed to render the object intelligible to its designated community. The representation information may include information about the hardware and software environment needed to view and interpret the content data object.

Figure 1

OAIS AIP logical structure

The other part of an AIP is the Preservation Description Information (PDI), which is further divided into four sections:

• Reference (globally unique and persistent identifiers for the content data object).
• Provenance (chain of custody, the history and the origin of the content information custody).
• Context (relationships of the content information to its environment).
• Fixity (a demonstration that the particular content information has not been altered in an undocumented manner).

Ensuring data usability and integrity over long periods of time
PDS is a novel storage architecture that has built-in support for long-term digital preservation based on OAIS. In contrast with traditional block or file storage, and even with traditional archival systems, PDS “materializes” the logical concept of a preservation information object, namely the AIP, into a physical storage object.

PDS encapsulates the raw data with its complex interrelated metadata objects, so they are inseparable during the migration process and future data access. It also reduces the volume of data transfers by offloading data-intensive functions, such as fixity computations, from applications to the storage environment. PDS also simplifies applications by transferring management responsibilities of storage-related events, such as provenance, to the storage environment itself.

Finally, PDS handles migration internally, including the execution of externally-specified logical transformations. PDS is composed of a layered architecture based on open standards, along with the OAIS, XAM and OSD standards and complies with the general design principle of preservation systems that employ open standards wherever possible (see Figure 2). A complementary services tool for assessing the maturity of digital repositories for long-term preservation is also under development.

Figure 2


Preservation DataStores Architecture


Experts on this topic:
Simona Cohen: SIMONA@il.ibm.com
Dalit Naor: DALIT@il.ibm.com
Michael Factor: FACTOR@il.ibm.com


Related Publications  

Simona Cohen, Dalit Naor, Leeat Ramati and Petra Reshef. Towards OAIS-Based Preservation Aware Storage - A White Paper. IBM Haifa. IBM Haifa Research Lab, November 2006.

Michael Factor, Dalit Naor, Simona Rabinovici-Cohen, Leeat Ramati, Petra Reshef, Julian Satran and Giaretta. Preservation DataStores: Architecture for Preservation Aware Storage. Proceedings of the IEEE Conference on Mass Storage Systems and Technologies (MSST). IEEE, September 2007.

Michael Factor, Dalit Naor, Simona Rabinovici-Cohen, Leeat Ramati, Petra Reshef and Julian Satran. The Need for Preservation Aware Storage - A Position Paper. ACM SIGOPS Operating Systems Review, Special Issue on File and Storage Systems 41(1), January 2007.

See also: Storage systems and performance management

Last updated September 19, 2007

Innovator's corner  

Simona CohenSimona Cohen Researcher
What is the most exciting potential future use for the work you're doing?
I’m excited to help preserve the world’s digital assets and enable the preservation of today’s information for future generations, who will discover new things based upon that information. Preserving medical records correlated with genetic information, for example, will help people in the future apply new applications to older information, better understand the evolution of life and discover new treatments and medications.

What is the most interesting part of your research?
I’m interested in creating a new storage concept that is preservation-aware and supports OAIS information objects and functions. We offload intelligent preservation-related functionality from the server to the storage component and by that provide a more optimized and robust solution.

What inspired you to go into this field?
I chose the storage field because it's similar to the funeral business: It will exist forever. Enhancing storage capabilities will have a big impact on the computer science field as almost every computerized device relies on a storage component. Moreover, there will always be opportunities for invention and innovation.

What is your favorite invention of all time?
The computer. It advanced almost every aspect of our lives, simplified processes and pushed the world’s knowledge to new levels.

Research team  

Shimon Agassi

Shimon Agassi

Dalit Naor

Dalit Naor

Leeat Ramati

Leeat Ramati

Petra Reshef

Petra Reshef

Julian Satran

Julian Satran

Shahar Ronen

Shahar Ronen

Content navigation