Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement H5wasmLocalFileProvider #1604

Merged
merged 1 commit into from
Apr 9, 2024
Merged

Implement H5wasmLocalFileProvider #1604

merged 1 commit into from
Apr 9, 2024

Conversation

axelboc
Copy link
Contributor

@axelboc axelboc commented Apr 5, 2024

Fix #1582 🎉

This is a first implementation of H5WasmLocalFileProvider, a provider that allows random access to local HDF5 files with h5wasm in a web worker, thus hopefully removing the ~2 GB file size limit inherent to the original, buffer-based H5WasmProvider.

H5WasmProvider remains available and unchanged. The demo just picks the right provider based on whether the user opens/drops a local file or enters the URL of a remote file.

The web worker is initialised with type: 'module' which limits browser support (FF >= 114). I'll have to investigate a way to detect support in consumer applications like myHDF5 to allow falling back to H5WasmProvider on unsupported browsers.

Comlink provides an RPC abstraction to interact with the web worker. The worker exposes a number of async methods to open a file, get an entity's metadata, and read dataset and attribute values, in line with the DataProviderApi requirements. Every call to h5wasm is done synchronously inside the worker, which considerably simplifies its implementation and the implementation of the provider's API class. The downside is that the worker cannot be extracted into a generic package (as was previously the plan with h5wasm-worker).

Next step will be to add the remaining provider features (compression plugins, export ...)

@axelboc axelboc force-pushed the local-worker branch 4 times, most recently from 5249164 to fba632e Compare April 8, 2024 10:00
apps/demo/src/h5wasm/DropZone.tsx Show resolved Hide resolved
Comment on lines -65 to +50
<p className={styles.fromUrl}>
You may also provide a URL if your file is hosted remotely:
</p>
<UrlForm onH5File={onH5File} />
{children}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small refactoring here to pass UrlForm as a child for better composition.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is really needed. DropZone is not used elsewhere ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it just felt weird having the UrlForm rendered by DropZone directly. They both allow providing a file but in different ways, so I though it was better to have them both being rendered from H5WasmApp.

Comment on lines +29 to +34
return h5wasm.ccall(
'H5Fopen',
'bigint',
['string', 'number', 'bigint'],
[filePath, h5wasm.H5F_ACC_RDONLY, 0n],
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Contrary to the approach initially taken in h5wasm-worker, I decided to not expose H5Wasm's high-level classes through comlink, like File, Dataset, etc.

The problem with the previous approach is that methods like Group#get() return class instances that are not fully serializable. This meant that we had to re-instantiate the high-level classes from outside the worker – as a result, the methods all became async, which made the code difficult to understand and debug. It felt like too many layers of abstraction.

The approach I'm taking instead is to interact directly with h5wasm's low-level API. Here, this means calling h5wasm.ccall, since all I need to return is the HDF5 fileId. I could have done something like return new File("my-filename.h5").fileId, but I chose to be explicit, especially since new File handles the case of creating a file, which we don't care about.

Further down, in getValue and getAttrValue, I did not have this luxury since the low-level API (_value_getter, process_data, etc.) is not currently exposed and the even lower-level calls to the HDF5 library are not trivial: https://github.com/usnistgov/h5wasm/blob/main/src/hdf5_hl.ts#L1055

I went with a compromise for now, but I think there's an opportunity to expose a "middle-level" API in H5Wasm to abstract away communicating with the HDF5 library, while remaining lower-level than the current class-based API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand your point here. Let's discuss in person next time.

packages/h5wasm/src/local/worker.ts Show resolved Hide resolved
isChild?: false,
): ProvidedEntity;

export function parseEntity(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This essentially duplicates the parseEntity utility in h5wasm/src/utils but takes the h5wasm module and a fileId as params instead of an H5Wasm entity object. Nothing prevents the current provider from adopting the same approach as the new provider, so refactoring is definitely on the agenda ... but no rush.

@axelboc axelboc marked this pull request as ready for review April 8, 2024 11:45
@axelboc axelboc requested a review from loichuder April 8, 2024 11:45
Copy link
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am amazed how concise and well-behaved this ended up.

Great work 👏

Comment on lines -65 to +50
<p className={styles.fromUrl}>
You may also provide a URL if your file is hosted remotely:
</p>
<UrlForm onH5File={onH5File} />
{children}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure it is really needed. DropZone is not used elsewhere ?

@axelboc axelboc merged commit 53ad1dd into main Apr 9, 2024
8 checks passed
@axelboc axelboc deleted the local-worker branch April 9, 2024 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Webworker h5wasm provider (for random access to local files)
2 participants