-
Notifications
You must be signed in to change notification settings - Fork 2
Moab Audit Failures
Note that the jobs which regularly run audit tasks should not fail from the Sidekiq/ActiveJob queue management perspective when they detect errors. Rather, they should update the status
on the relevant database field (e.g. complete_moabs.status
, zip_parts.status
) and send a Honeybadger alert. However, we sometimes fall short of this goal in the face of unforeseen failure modes, e.g. https://github.com/sul-dlss/preservation_catalog/issues/1696.
If auditing alerts about possible corruption of a Moab that resides in a local storage root, e.g. if ChecksumValidator
(or MoabToCatalog
or CatalogToMoab
) sends an alert or sets a status such as invalid_checksum
, there is unfortunately no one size fits all approach to remediation. These occurrences are fortunately rare, but may require anything from undoing mistaken hand edits to a manifest file, to decommissioning the Moab and re-accessioning the content if the preserved content is deemed missing or corrupt (assuming the original content is still attainable).
For an example of this situation, see https://jirasul.stanford.edu/jira/browse/SDRO-391
If Moab content must be edited, new copies should be replicated to the cloud once the Moab is in a good state.
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)