ckp_util - Checkpointing UtilitiesThe CKP_UTIL package provides an application with the ability to save and restore its internal state. Long-running applications can checkpoint intermediate states; should it crash, the application can start up again and restore the most recent state and continue from there.
The checkpointed states are saved in a checkpoint file, which is physically two files: the checkpoint file itself and an index file. The checkpoint "file" is created as follows:
#include "ckp_util.h" -- Checkpoint utilities.
CheckpointFile file ;
...
ckpOpen ("pathname", NULL, &file) ;
Named regions of memory containing an application's internal state must then be registered as checkpoints associated with the file:
Checkpoint checkpoint ;
struct {
... whatever ...
} internalState ;
...
ckpRegister (file, "name", &internalState, sizeof internalState, &checkpoint) ;
The information stored in the internalState structure can then
be saved, at any time, to the checkpoint file:
ckpSave (checkpoint) ;
If an application has multiple memory regions registered as checkpoints, the current contents of all of them can be saved with a single call:
ckpSaveAll (file) ;
When a memory region is checkpointed, the contents of the region are appended to the checkpoint file, thereby creating a new version of that checkpoint's state. The most recent version or earlier versions of a checkpoint may be retrieved from the checkpoint file and restored to the memory region:
ckpRestore (checkpoint, version) ;
(Since this may be a different run of the application, the address of the memory region being restored may be different than the address of the saved memory region; hence, you should be wary of storing pointers in checkpointed regions.) version can be an absolute version number (1..N) or a relative version number (0 for the most recent, -1 for the next most recent, ..., -N+1). All of an application's registered checkpoints may be restored at one time:
ckpRestoreAll (file, version) ;
When an application is done, it should close the checkpoint file(s):
ckpClose (file) ;
This automatically unregisters the file's registered checkpoints.
The writing of this package was inspired by the checkpointing capabilities found in AT&T's fault-tolerant library, described in "A Software Fault Tolerance Platform" by Yennun Huang and Chandra Kintala, in Practical Reusable Unix Software, edited by Balachander Krishnamurthy. I haven't read the book or that chapter; my knowledge of the checkpoint library was gathered from the man(1) page for the library. My package is simpler (and more understandable, I hope) than theirs, plus mine comes with the source code!
ckpClose() - closes a checkpoint file.
ckpLocate() - locates a checkpoint by name in a file's list of checkpoints.
ckpOpen() - opens a checkpoint file.
ckpRegister() - registers a named region of memory as a checkpoint.
ckpRestore() - restores a specified checkpoint from a checkpoint file.
ckpRestoreAll() - restores all of the registered checkpoints from the file.
ckpSave() - saves a specified checkpoint to a checkpoint file.
ckpSaveAll() - saves all of the registered checkpoints to the file.
ckpUnregister() - removes the registration of a checkpoint.
ckp_util.c
ckp_util.h