-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Webworker h5wasm provider (for random access to local files) #1582
Comments
I did test this code out, and it works - I am able to load a local file and read values, keys, attributes, etc. h5wasm, without reading the whole file contents into memory. I don't know how to bundle a worker script including all dependencies (it seems kind of tricky), which is why it's written in plain JS with importScripts above. NOTE: |
Thanks for opening an issue to track this! 💯
Yeah last time I tried, this is where I hit a wall. Vite has improved a lot since then, so I'll try again asap. |
Yes, I think things have improved! I replaced the // h5wasm_worker.ts
import * as Comlink from "comlink";
import h5wasm from "h5wasm"; |
After a whole morning of hair pulling:
|
Yes, I think I found that On the other hand, once the worker is bundled with esbuild or other as above, it no longer contains any import statements, and is usable in any context I would think. For electron-like apps (VS Code?) I imagine you can use |
I have been playing around with this... would it be useful to include a special build of h5wasm that uses a worker in the main h5wasm package? Here is a setup that works: // lib_worker.ts
import * as h5wasm from 'h5wasm';
const WORKERFS_MOUNT = '/workerfs';
async function save_to_workerfs(file) {
const { FS, WORKERFS, mount } = await workerfs_promise;
const { name: filename, size } = file;
const output_path = `${WORKERFS_MOUNT}/${filename}`;
if (FS.analyzePath(output_path).exists) {
console.warn(`File ${output_path} already exists. Overwriting...`);
}
const outfile = WORKERFS.createNode(mount, filename, WORKERFS.FILE_MODE, 0, file, file.lastModifiedDate);
return output_path;
}
async function _mount_workerfs() {
const { FS } = await h5wasm.ready;
const { filesystems: { WORKERFS } } = FS;
if (!FS.analyzePath(WORKERFS_MOUNT).exists) {
FS.mkdir(WORKERFS_MOUNT);
}
const mount = FS.mount(WORKERFS, {}, WORKERFS_MOUNT);
return { FS, WORKERFS, mount };
}
const workerfs_promise = _mount_workerfs();
export const api = {
ready: h5wasm.ready,
save_to_workerfs,
H5WasmFile: h5wasm.File,
Dataset: h5wasm.Dataset,
Group: h5wasm.Group,
Datatype: h5wasm.Datatype,
BrokenSoftLink: h5wasm.BrokenSoftLink,
} // worker.ts
import * as Comlink from 'comlink';
import { api } from './lib_worker';
Comlink.expose(api); // worker_proxy.ts
import * as Comlink from 'comlink';
import type { api } from './lib_worker.ts';
import { ACCESS_MODES } from './hdf5_hl.ts';
import type { File as H5WasmFile, Group, Dataset, Datatype, BrokenSoftLink } from './hdf5_hl.ts';
export type { H5WasmFile, Group, Dataset, Datatype, BrokenSoftLink };
type ACCESS_MODESTRING = keyof typeof ACCESS_MODES;
const worker = new Worker('./worker.js');
const remote = Comlink.wrap(worker) as Comlink.Remote<typeof api>;
export class GroupProxy {
proxy: Comlink.Remote<Group>;
file_id: bigint;
constructor(proxy: Comlink.Remote<Group>, file_id: bigint) {
this.proxy = proxy;
this.file_id = file_id;
}
async keys() {
return await this.proxy.keys();
}
async paths() {
return await this.proxy.paths();
}
async get(name: string = "/") {
const dumb_obj = await this.proxy.get(name);
// convert to a proxy of the object:
if (dumb_obj?.type === "Group") {
const new_group_proxy = await new remote.Group(dumb_obj.file_id, dumb_obj.path);
return new GroupProxy(new_group_proxy, this.file_id);
}
else if (dumb_obj?.type === "Dataset") {
return new remote.Dataset(dumb_obj.file_id, dumb_obj.path);
}
else if (dumb_obj?.type === "Datatype") {
return new remote.Datatype();
}
else if (dumb_obj?.type === "BrokenSoftLink") {
return new remote.BrokenSoftLink(dumb_obj?.target);
}
return
}
}
export class FileProxy extends GroupProxy {
filename: string;
mode: ACCESS_MODESTRING;
constructor(proxy: Comlink.Remote<H5WasmFile>, file_id: bigint, filename: string, mode: ACCESS_MODESTRING = 'r') {
super(proxy, file_id);
this.filename = filename;
this.mode = mode;
}
}
export async function get_file_proxy(filename: string, mode: ACCESS_MODESTRING = 'r') {
const file_proxy = await new remote.H5WasmFile(filename, mode);
const file_id = await file_proxy.file_id;
return new FileProxy(file_proxy, file_id, filename, mode);
}
export async function save_file(file: File) {
const { name, lastModified, size } = file;
console.log(`Saving file ${name} of size ${lastModified} to workerfs...`);
return await remote.save_to_workerfs(file);
} Which is then built with these two esbuild commands: npx esbuild --format=esm --bundle worker.ts > worker.js;
npx esbuild --format=esm --bundle worker_proxy.ts > worker_proxy.mjs; The resulting library can be used by importing The reason there are three files instead of two is that it's difficult to build |
Wow, this is brilliant! Definitely a nice approach. I'll try to get my head around it a bit more to understand how this will fit into the existing H5Web provider code (notably with loading compression plugins) but either way, we can iterate. I'm planning on providing a separate provider, maybe |
I moved these files to usnistgov/h5wasm#70 and I added a method for writing bytes to a MEMFS file, which I used for loading a plugin in the example code there. It's still a bit awkward, and I'm realizing there's really no use case for interacting with the filesystems within the worker except to load files and plugins, so I might rejigger my API so that e.g. a save function returns an H5WasmFile proxy instead of just a file path on success, and it would make sense to create a few API functions for loading plugin files and maybe listing the contents of the plugin folder. |
FYI, this is still on my mind. Last time I played with h5wasm-worker and tried to create an |
Is your code in a branch somewhere? I'd be happy to help debug. |
This comment was marked as outdated.
This comment was marked as outdated.
I've opened a The problem seems to come from |
I've made progress back in this repo by embracing I'll push forward, taking inspiration from what you've done in |
Yes - you have found the issue. I was about to write back to you. In fact, I think we can remove the reference to import.meta in the h5wasm module which will remove the need for |
Can you try now, without the |
I rebased the Then, I implemented a very dumb worker in H5Web with With this observation in hand, I tried once again to turn the existing To make debugging easier, I decided to remove as many layers of abstraction as I could and started implementing a worker from scratch that works directly, and solely, with the I'm not 100% sure I understand why, and I find it mind boggling that Vite was not warning me about this import somehow. Anyway, I think the approach of using the low-level |
After a few fixes (#1615 #1614), I can now confirm that I'll try to do the releases and upgrades asap. |
Is your feature request related to a problem?
Currently for the h5wasm provider, the entire file must be loaded into memory before use (it is written to the MEMFS virtual file system provided by Emscripten)
This puts an upper limit of 2GB (?) on the size of files that can be used with the h5wasm provider, and can cause memory issues for users (entire file in memory).
The only advantage of this system is that file access (once loaded) is very very fast.
Requested solution or feature
Use the WORKERFS Emscripten file system and a webworker-based h5wasm provider, which allows random-access to files on the users' computer without loading the entire thing into memory.
Alternatives you've considered
The new File System Access API could also solve this problem, where users could mount a local folder for working with and have random access to the files in that folder. This API is only fully implemented on Chrome-based browsers, however.
Additional context
Here is an example worker:
and here is example client code for interacting with the worker:
The text was updated successfully, but these errors were encountered: