-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement H5wasmLocalFileProvider
#1604
Conversation
5249164
to
fba632e
Compare
<p className={styles.fromUrl}> | ||
You may also provide a URL if your file is hosted remotely: | ||
</p> | ||
<UrlForm onH5File={onH5File} /> | ||
{children} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small refactoring here to pass UrlForm
as a child for better composition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it is really needed. DropZone
is not used elsewhere ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it just felt weird having the UrlForm
rendered by DropZone
directly. They both allow providing a file but in different ways, so I though it was better to have them both being rendered from H5WasmApp
.
return h5wasm.ccall( | ||
'H5Fopen', | ||
'bigint', | ||
['string', 'number', 'bigint'], | ||
[filePath, h5wasm.H5F_ACC_RDONLY, 0n], | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to the approach initially taken in h5wasm-worker, I decided to not expose H5Wasm's high-level classes through comlink, like File
, Dataset
, etc.
The problem with the previous approach is that methods like Group#get()
return class instances that are not fully serializable. This meant that we had to re-instantiate the high-level classes from outside the worker – as a result, the methods all became async, which made the code difficult to understand and debug. It felt like too many layers of abstraction.
The approach I'm taking instead is to interact directly with h5wasm's low-level API. Here, this means calling h5wasm.ccall
, since all I need to return is the HDF5 fileId
. I could have done something like return new File("my-filename.h5").fileId
, but I chose to be explicit, especially since new File
handles the case of creating a file, which we don't care about.
Further down, in getValue
and getAttrValue
, I did not have this luxury since the low-level API (_value_getter
, process_data
, etc.) is not currently exposed and the even lower-level calls to the HDF5 library are not trivial: https://github.com/usnistgov/h5wasm/blob/main/src/hdf5_hl.ts#L1055
I went with a compromise for now, but I think there's an opportunity to expose a "middle-level" API in H5Wasm to abstract away communicating with the HDF5 library, while remaining lower-level than the current class-based API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand your point here. Let's discuss in person next time.
isChild?: false, | ||
): ProvidedEntity; | ||
|
||
export function parseEntity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This essentially duplicates the parseEntity
utility in h5wasm/src/utils
but takes the h5wasm
module and a fileId
as params instead of an H5Wasm entity
object. Nothing prevents the current provider from adopting the same approach as the new provider, so refactoring is definitely on the agenda ... but no rush.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am amazed how concise and well-behaved this ended up.
Great work 👏
<p className={styles.fromUrl}> | ||
You may also provide a URL if your file is hosted remotely: | ||
</p> | ||
<UrlForm onH5File={onH5File} /> | ||
{children} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure it is really needed. DropZone
is not used elsewhere ?
Fix #1582 🎉
This is a first implementation of
H5WasmLocalFileProvider
, a provider that allows random access to local HDF5 files with h5wasm in a web worker, thus hopefully removing the ~2 GB file size limit inherent to the original, buffer-basedH5WasmProvider
.H5WasmProvider
remains available and unchanged. The demo just picks the right provider based on whether the user opens/drops a local file or enters the URL of a remote file.The web worker is initialised with
type: 'module'
which limits browser support (FF >= 114). I'll have to investigate a way to detect support in consumer applications like myHDF5 to allow falling back toH5WasmProvider
on unsupported browsers.Comlink provides an RPC abstraction to interact with the web worker. The worker exposes a number of async methods to open a file, get an entity's metadata, and read dataset and attribute values, in line with the
DataProviderApi
requirements. Every call to h5wasm is done synchronously inside the worker, which considerably simplifies its implementation and the implementation of the provider's API class. The downside is that the worker cannot be extracted into a generic package (as was previously the plan withh5wasm-worker
).Next step will be to add the remaining provider features (compression plugins, export ...)