-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Rename files with S3 illegal characters
- Rename files with S3 illegal characters to something that is S3 legal by replacing all illegal characters with a _ (underscore) - Ensure there are no duplicate file names after the renaming by appending a (1), (2) at the end of the filename if the file has been renamed - Keep a record of all of the file names as they originally existed and what they were renamed to - The record goes into a file called files_renamed.txt, which contains a list of all files that have been renamed and what they were renamed to, along with a date. - This files_renamed.txt file gets added to the dataset as a payload file - Update the migration upload snapshot so it doesn't get confused about the files now having different names Co-authored-by: Carolyn Cole <[email protected]>
- Loading branch information
1 parent
3dcad9e
commit e731591
Showing
8 changed files
with
206 additions
and
10 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# frozen_string_literal: true | ||
|
||
# We sometimes have data with filenames that contain characters that AWS S3 cannot handle. In those cases we want to: | ||
# 1. Rename the files to something that is AWS legal. Replace all illegal characters with a _ (underscore) | ||
# 2. Ensure there are no duplicate file names after the renaming by appending a (1), (2) at the end of the filename | ||
# if the file has been renamed | ||
# 3. Keep a record of all of the file names as they originally existed and what they were renamed to | ||
# 4. The record goes into a file called files_renamed.txt, which contains a list of all files that have been renamed | ||
# and what they were renamed to, along with a timestamp | ||
# 5. This files_renamed.txt file gets added to the dataset as a payload file, akin to a README.txt or license.txt | ||
class FileRenameMappingService | ||
attr_reader :upload_snapshot, :files, :renamed_files, :original_filenames | ||
|
||
def initialize(upload_snapshot:) | ||
@upload_snapshot = upload_snapshot | ||
@original_filenames = @upload_snapshot.files.map { |a| a["filename"] } | ||
@files = parse_files_to_rename | ||
@renamed_files = rename_files | ||
end | ||
|
||
def parse_files_to_rename | ||
files = [] | ||
@original_filenames.each do |original_filename| | ||
files << FileRenameService.new(filename: original_filename) | ||
end | ||
files | ||
end | ||
|
||
# Make a hash containing all files that need renaming. | ||
# The key of the hash is the original filename. | ||
# The value of the hash is the re-named file with an index number appended. | ||
def rename_files | ||
@upload_snapshot.with_lock do | ||
@upload_snapshot.reload | ||
rename_index = 1 | ||
renamed_files = {} | ||
@files.each do |file| | ||
next unless file.needs_rename? | ||
new_filename = file.new_filename(rename_index) | ||
renamed_files[file.original_filename] = new_filename | ||
# Also update the filename in the MigrationSnapshot | ||
@upload_snapshot.rename(file.original_filename, new_filename) | ||
rename_index += 1 | ||
end | ||
@upload_snapshot.save | ||
renamed_files | ||
end | ||
end | ||
|
||
# A rename is needed if any of the original filenames need renaming | ||
def rename_needed? | ||
@files.each do |file| | ||
return true if file.needs_rename? | ||
end | ||
false | ||
end | ||
|
||
# Format: "Sep 19 2023" | ||
def rename_date | ||
Time.zone.now.strftime("%d %b %Y") | ||
end | ||
|
||
def renaming_document | ||
message = "Some files have been renamed to comply with AWS S3 storage requirements\n" | ||
message += "Rename date: #{rename_date}\n" | ||
message += "Original Filename\t Renamed File\n" | ||
@files.each do |file| | ||
next unless file.needs_rename? | ||
message += "#{file.original_filename}\t#{@renamed_files[file.original_filename]}\n" | ||
end | ||
message | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# frozen_string_literal: true | ||
|
||
FactoryBot.define do | ||
factory :migration_upload_snapshot do | ||
url { "https://localhost.localdomain/file.txt" } | ||
version { 1 } | ||
work { FactoryBot.create(:approved_work) } | ||
files { [] } | ||
|
||
factory :migration_upload_snapshot_with_illegal_characters do | ||
files do | ||
[ | ||
{ | ||
"filename" => "10.34770/tbd/4/laser width.xlsx", | ||
"checksum" => "dGFh+f5CnwifPlEhkT1Amg==", | ||
"migrate_status" => "started" | ||
}, | ||
{ | ||
"filename" => "10.34770/tbd/4/all OH LIF decays.xlsx", | ||
"checksum" => "oCovyV5XT+jNMsDbUpP/xA==", | ||
"migrate_status" => "started" | ||
}, | ||
{ | ||
"filename" => "10.34770/tbd/4/Dry He 2mm 10kV le=0.8mJ RH 50%.csv", | ||
"checksum" => "4sUs+2GkGPPFHgjyY3NsPw==", | ||
"migrate_status" => "started" | ||
}, | ||
{ | ||
"filename" => "10.34770/tbd/4/Dry He 2mm 20kV le=0.8mJ RH 50%.csv", | ||
"checksum" => "nY0PImdocFIffUu0oAIpoA==", | ||
"migrate_status" => "started" | ||
} | ||
] | ||
end | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,44 @@ | ||
# frozen_string_literal: true | ||
require "rails_helper" | ||
|
||
RSpec.describe FileRenameMappingService do | ||
let(:upload_snapshot) { FactoryBot.create(:migration_upload_snapshot_with_illegal_characters) } | ||
let(:subject) { described_class.new(upload_snapshot: upload_snapshot) } | ||
let(:file_needing_rename) { "10.34770/tbd/4/Dry He 2mm 20kV le=0.8mJ RH 50%.csv" } | ||
let(:file_not_needing_rename) { "10.34770/tbd/4/laser width.xlsx" } | ||
|
||
it "has an upload snapshot" do | ||
expect(subject.upload_snapshot).to eq upload_snapshot | ||
end | ||
|
||
it "has an array of FileRenameService objects" do | ||
expect(subject.files.count).to eq 4 | ||
expect(subject.files.first).to be_instance_of FileRenameService | ||
end | ||
|
||
it "has a hash of renamed files" do | ||
expect(subject.renamed_files).to be_instance_of Hash | ||
end | ||
|
||
it "adds sequential numbers when it renames files" do | ||
expect(subject.renamed_files[file_needing_rename]).to eq "10.34770/tbd/4/Dry He 2mm 20kV le_0.8mJ RH 50_(2).csv" | ||
end | ||
|
||
it "has a list of original filenames" do | ||
original_filenames = [ | ||
"10.34770/tbd/4/laser width.xlsx", | ||
"10.34770/tbd/4/all OH LIF decays.xlsx", | ||
"10.34770/tbd/4/Dry He 2mm 10kV le=0.8mJ RH 50%.csv", | ||
"10.34770/tbd/4/Dry He 2mm 20kV le=0.8mJ RH 50%.csv" | ||
] | ||
expect(subject.original_filenames).to eq original_filenames | ||
end | ||
|
||
it "knows whether it needs to rename any files" do | ||
expect(subject.rename_needed?).to eq true | ||
end | ||
|
||
it "produces a mapping of all the file renaming" do | ||
expect(subject.renaming_document).to match(/Some files have been renamed/) | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters