diff --git a/index.html b/index.html index ba7df9b..6b85ae1 100644 --- a/index.html +++ b/index.html @@ -98,6 +98,12 @@
Problems I experienced with Google Takeout and how I worked around them
+ +When I purchased the mulcahy.ca domain name I thought it would be a good idea +to use Google Apps For Your Domain. Basically I got something that looked like +a normal Google account - gmail, etc - but my login was [redacted]@mulcahy.ca +instead of [redacted]@gmail.com. There were some frustrations over the years, +but I was able to live with them. My takeaway from the experience is that it's +better to use the mass market product than a niche product. If you want things +to just work choose Toyota over Ferrari.
+Google Apps For Your Domain went through multiple renames and each rename +seemed to come with a price increase. I found some details +here. I don't +believe I had ever used the free product - maybe I was initially paying +$50/year. Now the product is called Google Workspace and I'm paying $40 - per +month! The cost has increased by nearly 10x. It seems to be aimed at small +business owners rather than hobbyists.
+I host my blog for free on Github Pages and my domain registrar provides email +forwarding for $5/year, so I've wanted to leave Google Workspace and save some +money for a long time. In theory, it should be easy. Google Takeout advertises +the ability to download all of your data. In practice, not so much.
+The data I downloaded from takeout was incomplete. Apparently I'm not the only +one. +I downloaded folders from Google Drive and some files in subfolders were +missing. This is a big problem for me because I have important records I can't +lose. I'm fortunate that I noticed a problem before deleting the original data.
+When I used takeout to download photos from Google Photos the timestamps were +missing. This was a big problem for me because my photos of my children as +babies were in my Google Workspace account. I really wanted to know when each +photo was taken.
+I had lots of gdocs and Takeout automatically converted them to docx. Some of +the file/folder names got changed slightly. This is not as much of a +deal-breaker as the first two, but it still makes me unhappy. I wanted to +re-upload the takeout files into my personal Gmail account, so they should be +able to remain as gdocs. Although this whole debacle makes me question whether +I should be using gdocs. I will defer that to a later time.
+I used "Partner Sharing" to share all of my photos with my personal Gmail +account, and then in that personal account I chose to save all photos to my +account. This worked quite well and really wasn't too hard. The only thing I +didn't like is that it didn't tell me when sharing was complete, so I don't +really know when it's safe to delete the originals, but probably it's okay a +day later? Make sure you choose the option to save the shared photos to your +account.
+I couldn't find an easy solution for Google Drive. There is a setting to +transfer ownership but it doesn't allow you to transfer outside of your +"organization". You can share folders with people outside of your organization, +but you can't transfer ownership.
+I also tried syncing data with Google Drive for Desktop and then manually +copying. This lost the gdrive-native files, like gdoc. The problem is that +these are synced as pointers to the original files, which are owned by my +workspace account.
+I moved all of my content into a folder named to-share and shared that with my +personal account. Then I had ChatGPT write me some Python to copy that folder +recursively. I had to generate a credentials.json file which was annoying and +I'd rather not have done with my personal account. But I'll delete the +credentials after the operation is complete.
+ChatGPT's code kind of worked, but I had to tweak it to preserve file name and +metadata. And then I noticed that it was only copying the first 100 files in +large folders, so I had to change it to handle pagination. It's worthwhile +double-checking the results. Here's the code in case it helps someone else.
+I find it absurd how difficult this is, but I think it's unlikely there was any +ill intent. Google probably made the decision to not allow ownership transfer +outside of the organization because it'd be rather catastrophic if you +transferred ownership accidentally.
+import os
+import io
+import google.auth
+from google.auth.transport.requests import Request
+from google.oauth2.credentials import Credentials
+from google_auth_oauthlib.flow import InstalledAppFlow
+from googleapiclient.discovery import build
+from googleapiclient.http import MediaIoBaseDownload
+
+# If modifying or deleting the scope later, delete the token.json file to revoke the old one
+SCOPES = ['https://www.googleapis.com/auth/drive']
+
+# Authenticate and create the service
+def authenticate():
+ """Authenticate and return the service."""
+ creds = None
+ # The file token.json stores the user's access and refresh tokens, and is created automatically when the authorization flow completes for the first time.
+ if os.path.exists('token.json'):
+ creds = Credentials.from_authorized_user_file('token.json', SCOPES)
+ # If there are no (valid) credentials available, let the user log in.
+ if not creds or not creds.valid:
+ if creds and creds.expired and creds.refresh_token:
+ creds.refresh(Request())
+ else:
+ flow = InstalledAppFlow.from_client_secrets_file(
+ 'credentials.json', SCOPES)
+ creds = flow.run_local_server(port=8080)
+ # Save the credentials for the next run
+ with open('token.json', 'w') as token:
+ token.write(creds.to_json())
+
+ service = build('drive', 'v3', credentials=creds)
+ return service
+
+def get_folder_contents(service, folder_id):
+ """Get all files and subfolders in a folder, handling pagination."""
+ items = []
+ page_token = None
+
+ while True:
+ # List the files in the folder, handling pagination with page_token
+ results = service.files().list(
+ q=f"'{folder_id}' in parents",
+ fields="nextPageToken, files(id, name, mimeType)",
+ pageToken=page_token
+ ).execute()
+
+ # Add the files from this page to the list of items
+ items.extend(results.get('files', []))
+
+ # Check if there is another page of results
+ page_token = results.get('nextPageToken')
+ if not page_token:
+ break # No more pages, exit the loop
+
+ print(f"returning {len(items)} items")
+
+ return items
+
+def copy_file(service, file_id, folder_id):
+ """Copy a file to the new folder, preserving the original file name."""
+ # Get the file's metadata to preserve its original name
+ file = service.files().get(fileId=file_id, fields='name').execute()
+ file_name = file['name']
+
+ # Prepare the metadata for the copy operation
+ file_metadata = {'name': file_name, 'parents': [folder_id]}
+
+ # Copy the file to the new folder
+ copied_file = service.files().copy(fileId=file_id, body=file_metadata).execute()
+ print(f"Copied file: {copied_file['name']} - original_id={file_id}, new_id={copied_file['id']}, mimeType={copied_file['mimeType']}")
+ print(f"{copied_file=}")
+
+ return copied_file['id']
+
+# After copying the file, restore timestamps
+def restore_timestamps(service, copied_file_id, original_file_id):
+ original_file = service.files().get(fileId=original_file_id, fields='createdTime, modifiedTime').execute()
+ created_time = original_file['createdTime']
+ modified_time = original_file['modifiedTime']
+
+ # Update the copied file's timestamps (Google Drive doesn't allow setting createdTime directly, but we can update modifiedTime)
+ updated_file_metadata = {'modifiedTime': modified_time}
+ service.files().update(fileId=copied_file_id, body=updated_file_metadata).execute()
+
+ print(f"Restored timestamps for copied file {copied_file_id}")
+
+# Copy a folder
+def copy_folder(service, source_folder_id, destination_folder_id):
+ """Recursively copy a folder and its contents."""
+ # First, copy all files in the folder
+ items = get_folder_contents(service, source_folder_id)
+
+ for item in items:
+ if item['mimeType'] == 'application/vnd.google-apps.folder': # If it's a folder
+ # Create the folder in the destination
+ folder_metadata = {'name': item['name'], 'mimeType': 'application/vnd.google-apps.folder', 'parents': [destination_folder_id]}
+ new_folder = service.files().create(body=folder_metadata, fields='id, name').execute()
+ print(f"Created folder: {new_folder['name']}")
+ restore_timestamps(service, copied_file_id=new_folder['id'], original_file_id=item['id'])
+ # Recursively copy the contents of this folder
+ copy_folder(service, item['id'], new_folder['id'])
+ else:
+ # Copy the file to the destination folder
+ copied_file_id = copy_file(service, item['id'], destination_folder_id)
+ restore_timestamps(service, copied_file_id=copied_file_id, original_file_id=item['id'])
+
+# Main function to copy folder
+def copy_drive_folder(source_folder_id, destination_folder_id):
+ service = authenticate()
+ copy_folder(service, source_folder_id, destination_folder_id)
+
+if __name__ == '__main__':
+ source_folder_id = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX' # Replace with your source folder ID
+ destination_folder_id = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY' # Replace with your destination folder ID
+ copy_drive_folder(source_folder_id, destination_folder_id)
+
You'll also need to pip install dependencies:
+pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
+
I didn't really have much of a problem here because I've been forwarding my +email from my Google Workspace account to my personal account for years. I've +also set up OfflineIMAP so I can +keep local copies of my emails.
+I don't think I really have anything else there, but I used Google Takeout to +download everything else just in case.
+ ++ If you enjoyed this post, please let me know on + Twitter + or + Mastodon. +
+ ++ Posted December 27, 2024. +
++ Tags: #python +
+ + + + + diff --git a/rss.xml b/rss.xml index d1a219f..c2d788e 100644 --- a/rss.xml +++ b/rss.xml @@ -8,6 +8,14 @@