-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenericFileSystem Buffered Copy #1578
Comments
Indeed, it has proven hard to implement open_async even for filesystems that are fundamentally async. Just because each operation is async doesn't necessarily mean you can leave a connection open in the background. Particularly, there seem to be no good ways to write async to a file (more below). The file caching process seemed like a good intermediate place, where we can stream from multiple files and write to multiple files in batches and not be limited by strict streaming. Details: aiohttp allows for async writes using a pull pattern where you can provide an async generator. However, our situation is push because we are waiting on reads from another async stream, so the typical |
@martindurant , that context is great, thanks. curous about the 3 options i listed above. i think the "fastest" win(selfishly for me) would be to figure out how to get |
Can't we just use https://pypi.org/project/aioshutil/ for the aio version of shutil.copyfileobj? |
Not too bad to implement on your own apparently: Tinche/aiofiles#61 (comment) |
aioshutil would have had the same pull problem I described above. We don't know how to make a generic
It would be OK maybe. We'd we restricted to working on one file at a time, which works for cp_file() but very not for copy(). |
Hi all. Love fsspec.
I'm trying to use GenericFileSystem like this:
This will currently use the
fsspec.generic.GenericFileSystem._copy
method which creates a temp file on disk(by default) e.g.sftp -> local -> gs
. This is undesirable for my use case.Assumption 1
When looking at GenericFileSystem, there is a buffering implementation in
fsspec.generic.GenericFileSystem._cp_file
however, I don't think that method will ever be called because_copy
has been implemented(unless it is added to_copy
).Force
_cp_file
to be usedIf I remove
_copy
(rename to__copy
) then_cp_file
is in fact called, but there is a problem. The error is thatopen_async
is not implemented by either of the filesystems in my example(sshfs.spec.SSHFileSystem
andgcsfs.core.GCSFileSystem
)Assumption 2
This is very confusing to me because both
SSHFileSystem
andGCSFileSystem
both extendfsspec.asyn.AsyncFileSystem
but neither of them implementopen_async
. So, inGenericFileSystem._cp_file
when theif hasattr(fs, "open_async")
checks are done, they returntrue
because they technically have that attribute/method, but it is not implemented.Force sync open
If a sync open is forced in
GenericFileSystem._cp_file
then another erroris found because we are in an async context but trying to call a sync method.
Question
What to do here?
rsync
that doesn't try to be generic? (kind of already did this, heavily based on `GenericFileSystemopen_async
in bothSSHFileSystem
andGCSFileSystem
The text was updated successfully, but these errors were encountered: