-
Notifications
You must be signed in to change notification settings - Fork 2
ZipmakerJob failures
NOTE: this page is a bit stale in the wake of preservation_catalog's switch from resque-pool to Sidekiq in 2023. The logic of dump_and_return_failure_queue_entries relied on some Resque methods, but could likely be ported to Sidekiq without too much trouble. dump_and_return_failure_queue_entries just got the job arguments for everything in the given failure queue. This should also be much less necessary since we've mostly resolved preservation storage availability problems in 2022 and 2023.
The below code can help to safely automate cleanup of failed ZipmakerJob
s. Since ZipmakerJob
(via DruidVersionZip
) now attempts to delete any bad zip files it creates, this should rarely be necessary. But for cases where the cleanup failed, and the zip file hasn't aged out of temp space, some manual cleanup may be needed.
We can build on the above generalized code for wrangling failure queue info with some code for dealing with entries in zipmaker_failed
.
failure_queue_arg_lists = dump_and_return_failure_queue_entries('zipmaker_failed').map { |failure| failure[:args][0]['arguments'] }
zm_druid_version_zips = failure_queue_arg_lists.map { |args| DruidVersionZip.new(args[0], args[1], args[2]) } ; nil
bad_druid_version_zips = zm_druid_version_zips.map do |dvz|
if File.exist?(dvz.file_path)
if dvz.send(:zip_size_ok?)
puts "ok: zip exists at #{dvz.file_path}, but size is ok"
nil
else
puts "ERROR: zip exists at #{dvz.file_path} but is smaller than moab version"
dvz
end
else
puts "ok: no cached zip exists at #{dvz.file_path}"
nil
end
end.compact ; nil
# it's possible that not all entries in the failure queue will have bad zips sticking around: for example, ZipmakerJob might've detected the moab
# was unreadable before trying to create the zip, the zip may have already aged out of temp space if the failed job is old enough, etc.
# use DruidVersionZip#cleanup_zip_parts! to handle the cleanup for us
bad_druid_version_zips.each { |bad_dvz| bad_dvz.send(:cleanup_zip_parts!) }
# now it should be fine to retry everything in queue.
# re-running the above `bad_druid_version_zips = zm_druid_version_zips.map do |dvz|...` code should result in an empty bad_druid_version_zips, since the cleanup loop should've removed any busted zip files.
- Replication errors
- Validate moab step fails during preservationIngestWF
- ZipmakerJob failures
- Moab Audit Failures
- Ceph Errors
- Job queues
- Deposit bag was missing
- ActiveRecord and Replication intro
- 2018 Work Cycle Documentation
- Fixing a stuck Moab
- Adding a new cloud provider
- Audits (how to run as needed)
- Extracting segmented zipfiles
- AWS credentials, S3 configuration
- Zip Creation
- Storage Migration Additional Information
- Useful ActiveRecord queries
- IO against Ceph backed preservation storage is hanging indefinitely (steps to address IO problems, and follow on cleanup)