Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure reference FS wraps any sync filesystems #1755

Merged
merged 4 commits into from
Dec 5, 2024

Conversation

moradology
Copy link
Contributor

This PR guards against the possibility (and likelihood, as LocalFileSystem is the default value here) that non-async filesystems are used in ReferenceFileSystem. This is important, as ReferenceFileSystem is, itself, an AsyncFileSystem and it is currently failing to provide functional methods required by that interface (cf. https://github.com/fsspec/filesystem_spec/blob/master/fsspec/implementations/reference.py#L809)

@martindurant
Copy link
Member

You are thinking of a referenceFS called from an async context?

@moradology
Copy link
Contributor Author

The issue arises here, I believe: https://github.com/zarr-developers/zarr-python/blob/main/src/zarr/storage/remote.py#L219-L225 Here, the _cat_file method is explicitly called and it looks as though the default local filesystem is not playing nicely in that case

@moradology
Copy link
Contributor Author

Test failures here are pretty strange. Don't think these changes should have any effect on test_async.py - certainly nothing obvious

@martindurant
Copy link
Member

Those tests seem to be counting the number of coroutines, but now you are wrapping the coroutines inside coroutines? Maybe there's a more rigorous way to count.

@moradology
Copy link
Contributor Author

OK, interesting. Perhaps I'm missing something crucial about how that might be happening, because the failures in test_async.py don't seem like they'd implicate ReferenceFileSystem or the AsyncFileSystemWrapper

@martindurant
Copy link
Member

Actually this is the error. The stuff about async is in the cleanup stage for those coroutines (which ought to be awaited in the test even when they fail, but this doesn't cause a failure)

m = <fsspec.implementations.memory.MemoryFileSystem object at 0x7f1b6cc6dc10>

    def test_fss_has_defaults(m):
        fs = fsspec.filesystem("reference", fo={})
        assert None in fs.fss
    
        fs = fsspec.filesystem("reference", fo={}, remote_protocol="memory")
>       assert fs.fss[None].protocol == "memory"
E       AssertionError: assert 'abstract' == 'memory'
E         
E         - memory
E         + abstract

@martindurant
Copy link
Member

To fix the async thing (if you wish to try), probably in this block from _run_coros_in_chunks, all remaining coroutines should be cancelled and awaited

        try:
            return await asyncio.wait_for(coro, timeout=timeout), i
        except Exception as e:
            if not return_exceptions:
                # <- cancel pending coroutines
                raise
            return e, i

(or maybe try/except at await asyncio.wait(..) a few lines down)

@moradology
Copy link
Contributor Author

The remaining s3fs test failure is pretty strange! Looks to me like conda is installing the wrong version? Came across this: eWaterCycle/ewatercycle#468

Not sure of the best way to test it, unfortunately, as this is pretty specific to CI. Looks like we may be able to simply pin the version over in s3fs (or maybe here?): pyopenssl>=24.0.0

@martindurant
Copy link
Member

It didn't show up in the s3fs CI (yet), so we can pin. it here. I am confused too...

@martindurant
Copy link
Member

I mean, may as well put it in this PR. v24.0.0 is almost a year old anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants