Skip to main content

IBM Israel Research Seminars

 

Kurzweil says, computers will enable people to live forever and doctors will be doing backup of your memories by late 2030. This talk is not about that, yet. Instead, the remarkable drop in disk costs makes it possible and attractive to retain past application states and store them for a long time for mining or auditing. A still open question is how to best organize the past state storage? Split snapshots are a recent approach to past state storage that is attractive for several reasons. Split snapshots are persistent, can be taken with high-frequency, and they are transactionally consistent. Unmodified database code can run against them. Like no other past state storage approach, they provide low-cost discriminated garbage collection of snapshots, a useful capability in long-lived systems since indiscriminately keeping all snapshots accessible is impractical even if raw disk storage is cheap, because administering such large-volume storage is expansive over long duration.

A number of novel techniques underly split snapshots. A new in-memory data-structure creates consistent copy-on-write snapshots without blocking, a new persistent data structure provides high performance versioned meta-data, and a new snapshot storage organization allows to gradually garbage collect selected copy-on-write snapshots without copying and without creating disk-fragmentation. Measurements of a split snapshot prototype system indicate that the new techniques are efficient and scalable, imposing minimal ($4\%$) performance penalty on a storage system, on expected common workloads.

Joint work with Ross Shaull and Hao Xu.