diff --git a/index.html b/index.html index ba7df9b..6b85ae1 100644 --- a/index.html +++ b/index.html @@ -98,6 +98,12 @@


+ + Frustrations with Google Takeout + +
CLIs with updating statuses diff --git a/posts/frustrations-with-google-takeout.html b/posts/frustrations-with-google-takeout.html new file mode 100644 index 0000000..be9624b --- /dev/null +++ b/posts/frustrations-with-google-takeout.html @@ -0,0 +1,324 @@ + + + + + + Frustrations with Google Takeout + + + + +

Frustrations with Google Takeout

+ +

Problems I experienced with Google Takeout and how I worked around them

+ +

When I purchased the mulcahy.ca domain name I thought it would be a good idea +to use Google Apps For Your Domain. Basically I got something that looked like +a normal Google account - gmail, etc - but my login was [redacted]@mulcahy.ca +instead of [redacted]@gmail.com. There were some frustrations over the years, +but I was able to live with them. My takeaway from the experience is that it's +better to use the mass market product than a niche product. If you want things +to just work choose Toyota over Ferrari.


Google Apps For Your Domain went through multiple renames and each rename +seemed to come with a price increase. I found some details +here. I don't +believe I had ever used the free product - maybe I was initially paying +$50/year. Now the product is called Google Workspace and I'm paying $40 - per +month! The cost has increased by nearly 10x. It seems to be aimed at small +business owners rather than hobbyists.


I host my blog for free on Github Pages and my domain registrar provides email +forwarding for $5/year, so I've wanted to leave Google Workspace and save some +money for a long time. In theory, it should be easy. Google Takeout advertises +the ability to download all of your data. In practice, not so much.


Takeout Problem #1: Incomplete data


The data I downloaded from takeout was incomplete. Apparently I'm not the only +one. +I downloaded folders from Google Drive and some files in subfolders were +missing. This is a big problem for me because I have important records I can't +lose. I'm fortunate that I noticed a problem before deleting the original data.


Takeout Problem #2: Missing metadata


When I used takeout to download photos from Google Photos the timestamps were +missing. This was a big problem for me because my photos of my children as +babies were in my Google Workspace account. I really wanted to know when each +photo was taken.


Takeout Problem #3: File conversion


I had lots of gdocs and Takeout automatically converted them to docx. Some of +the file/folder names got changed slightly. This is not as much of a +deal-breaker as the first two, but it still makes me unhappy. I wanted to +re-upload the takeout files into my personal Gmail account, so they should be +able to remain as gdocs. Although this whole debacle makes me question whether +I should be using gdocs. I will defer that to a later time.


Google Photos Solution


I used "Partner Sharing" to share all of my photos with my personal Gmail +account, and then in that personal account I chose to save all photos to my +account. This worked quite well and really wasn't too hard. The only thing I +didn't like is that it didn't tell me when sharing was complete, so I don't +really know when it's safe to delete the originals, but probably it's okay a +day later? Make sure you choose the option to save the shared photos to your +account.


Google Drive Solution


I couldn't find an easy solution for Google Drive. There is a setting to +transfer ownership but it doesn't allow you to transfer outside of your +"organization". You can share folders with people outside of your organization, +but you can't transfer ownership.


I also tried syncing data with Google Drive for Desktop and then manually +copying. This lost the gdrive-native files, like gdoc. The problem is that +these are synced as pointers to the original files, which are owned by my +workspace account.


I moved all of my content into a folder named to-share and shared that with my +personal account. Then I had ChatGPT write me some Python to copy that folder +recursively. I had to generate a credentials.json file which was annoying and +I'd rather not have done with my personal account. But I'll delete the +credentials after the operation is complete.


ChatGPT's code kind of worked, but I had to tweak it to preserve file name and +metadata. And then I noticed that it was only copying the first 100 files in +large folders, so I had to change it to handle pagination. It's worthwhile +double-checking the results. Here's the code in case it helps someone else.


I find it absurd how difficult this is, but I think it's unlikely there was any +ill intent. Google probably made the decision to not allow ownership transfer +outside of the organization because it'd be rather catastrophic if you +transferred ownership accidentally.

import os
+import io
+import google.auth
+from google.auth.transport.requests import Request
+from google.oauth2.credentials import Credentials
+from google_auth_oauthlib.flow import InstalledAppFlow
+from googleapiclient.discovery import build
+from googleapiclient.http import MediaIoBaseDownload
+# If modifying or deleting the scope later, delete the token.json file to revoke the old one
+SCOPES = ['https://www.googleapis.com/auth/drive']
+# Authenticate and create the service
+def authenticate():
+    """Authenticate and return the service."""
+    creds = None
+    # The file token.json stores the user's access and refresh tokens, and is created automatically when the authorization flow completes for the first time.
+    if os.path.exists('token.json'):
+        creds = Credentials.from_authorized_user_file('token.json', SCOPES)
+    # If there are no (valid) credentials available, let the user log in.
+    if not creds or not creds.valid:
+        if creds and creds.expired and creds.refresh_token:
+            creds.refresh(Request())
+        else:
+            flow = InstalledAppFlow.from_client_secrets_file(
+                'credentials.json', SCOPES)
+            creds = flow.run_local_server(port=8080)
+        # Save the credentials for the next run
+        with open('token.json', 'w') as token:
+            token.write(creds.to_json())
+    service = build('drive', 'v3', credentials=creds)
+    return service
+def get_folder_contents(service, folder_id):
+    """Get all files and subfolders in a folder, handling pagination."""
+    items = []
+    page_token = None
+    while True:
+        # List the files in the folder, handling pagination with page_token
+        results = service.files().list(
+            q=f"'{folder_id}' in parents",
+            fields="nextPageToken, files(id, name, mimeType)",
+            pageToken=page_token
+        ).execute()
+        # Add the files from this page to the list of items
+        items.extend(results.get('files', []))
+        # Check if there is another page of results
+        page_token = results.get('nextPageToken')
+        if not page_token:
+            break  # No more pages, exit the loop
+    print(f"returning {len(items)} items")
+    return items
+def copy_file(service, file_id, folder_id):
+    """Copy a file to the new folder, preserving the original file name."""
+    # Get the file's metadata to preserve its original name
+    file = service.files().get(fileId=file_id, fields='name').execute()
+    file_name = file['name']
+    # Prepare the metadata for the copy operation
+    file_metadata = {'name': file_name, 'parents': [folder_id]}
+    # Copy the file to the new folder
+    copied_file = service.files().copy(fileId=file_id, body=file_metadata).execute()
+    print(f"Copied file: {copied_file['name']} - original_id={file_id}, new_id={copied_file['id']}, mimeType={copied_file['mimeType']}")
+    print(f"{copied_file=}")
+    return copied_file['id']
+# After copying the file, restore timestamps
+def restore_timestamps(service, copied_file_id, original_file_id):
+    original_file = service.files().get(fileId=original_file_id, fields='createdTime, modifiedTime').execute()
+    created_time = original_file['createdTime']
+    modified_time = original_file['modifiedTime']
+    # Update the copied file's timestamps (Google Drive doesn't allow setting createdTime directly, but we can update modifiedTime)
+    updated_file_metadata = {'modifiedTime': modified_time}
+    service.files().update(fileId=copied_file_id, body=updated_file_metadata).execute()
+    print(f"Restored timestamps for copied file {copied_file_id}")
+# Copy a folder
+def copy_folder(service, source_folder_id, destination_folder_id):
+    """Recursively copy a folder and its contents."""
+    # First, copy all files in the folder
+    items = get_folder_contents(service, source_folder_id)
+    for item in items:
+        if item['mimeType'] == 'application/vnd.google-apps.folder':  # If it's a folder
+            # Create the folder in the destination
+            folder_metadata = {'name': item['name'], 'mimeType': 'application/vnd.google-apps.folder', 'parents': [destination_folder_id]}
+            new_folder = service.files().create(body=folder_metadata, fields='id, name').execute()
+            print(f"Created folder: {new_folder['name']}")
+            restore_timestamps(service, copied_file_id=new_folder['id'], original_file_id=item['id'])
+            # Recursively copy the contents of this folder
+            copy_folder(service, item['id'], new_folder['id'])
+        else:
+            # Copy the file to the destination folder
+            copied_file_id = copy_file(service, item['id'], destination_folder_id)
+            restore_timestamps(service, copied_file_id=copied_file_id, original_file_id=item['id'])
+# Main function to copy folder
+def copy_drive_folder(source_folder_id, destination_folder_id):
+    service = authenticate()
+    copy_folder(service, source_folder_id, destination_folder_id)
+if __name__ == '__main__':
+    source_folder_id = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'  # Replace with your source folder ID
+    destination_folder_id = 'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY'  # Replace with your destination folder ID
+    copy_drive_folder(source_folder_id, destination_folder_id)
+ +

You'll also need to pip install dependencies:

pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib
+ +

Gmail Solution


I didn't really have much of a problem here because I've been forwarding my +email from my Google Workspace account to my personal account for years. I've +also set up OfflineIMAP so I can +keep local copies of my emails.


Solutions for other Google services


I don't think I really have anything else there, but I used Google Takeout to +download everything else just in case.

+ +
+ +

+ If you enjoyed this post, please let me know on + Twitter + or + Mastodon. +

+ +

+ Posted December 27, 2024. +


+ Tags: #python +

+ + + + + diff --git a/rss.xml b/rss.xml index d1a219f..c2d788e 100644 --- a/rss.xml +++ b/rss.xml @@ -8,6 +8,14 @@ en-us + Frustrations with Google Takeout + https://mulcahy.ca/posts/frustrations-with-google-takeout.html + Problems I experienced with Google Takeout and how I worked around them + 2024-12-27 06:08:28 + posts/frustrations-with-google-takeout.html + + + CLIs with updating statuses https://mulcahy.ca/posts/clis-with-updating-statuses.html Different methods for updating statuses inside a CLI diff --git a/tags/python.html b/tags/python.html index d225eb0..d9ff815 100644 --- a/tags/python.html +++ b/tags/python.html @@ -98,6 +98,12 @@


+ + Frustrations with Google Takeout + +
Simple password generator in Python