Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single-module components #435

Open
sunfishcode opened this issue Jan 9, 2025 · 3 comments
Open

Single-module components #435

sunfishcode opened this issue Jan 9, 2025 · 3 comments

Comments

@sunfishcode
Copy link
Member

Today, most components in practice contain 3 core modules (not even counting the preview1-to-preview2 adapter).

The reasons this happens it that we have an import cycle. Every executable Wasm module compiled from a linear-memory language exports a linear memory, and typically imports functions which access that memory, like this:

(core module
    ...
    (type (;4;) (func (param i32 i32)))
    ...
    (import "wasi:http/[email protected]" "[method]outgoing-body.write" (func $_ZN4wasi8bindings4wasi4http5types12OutgoingBody5write10wit_import17hc3612d9fffb4f5c7E (;10;) (type 4)))
    ...
    (export "memory" (memory 0))
    ...
)

One of the i32 parameters to write is a pointer into the exported memory. This forms an implicit import cycle; the Wasm module is importing a function, the functions needs to be able to access a memory that it imports from the Wasm module.

The component model disallows import cycles, however the component-model tooling knows how to automatically break cycles.

To do this, it first adds a module which defines a function table, but does not initialze it. This module has function exports to satisfy the original module's function imports, which are wrappers around call_indirect on an exported table:

(core module
    ...
    (table (;0;) 20 20 funcref)
    ...
    (export "$imports" (table 0))
    ...
    (export "2" (func $"indirect-wasi:http/[email protected][method]outgoing-body.write"))
    ...
    (func $"indirect-wasi:http/[email protected][method]outgoing-body.write" (;2;) (type 1) (param i32 i32)
      local.get 0
      local.get 1
      i32.const 2
      call_indirect (type 1)
    )
    ...
)

Because the function table isn't initialized here, this module doesn't import anything, so it can be instantiated first.

The other module imports that table, and imports the actual functions, and initializes the table with them:

(core module
    ...
    (type (;1;) (func (param i32 i32)))
    ...
    (import "" "2" (func (;2;) (type 1)))
    ...
    (import "" "$imports" (table (;0;) 20 20 funcref))
    (elem (;0;) (i32.const 0) func 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19)
    ...
)

This last module is instantiated last, and this table initialization is all it does. In all, this breaks the cycle.

It's cool that the tooling knows how to do this automatically, but it's unfortunate that this is necessary in this common situation. The call_indirects add overhead, the extra modules make instantiation more complex, and all this extra code makes it difficult for people looking at components to understand how they work.

One approach to fixing this that's been discussed in various places is to have special library modules that just export a linear memory and perhaps also malloc and similar functions. This may still be desirable to do for dynamic linking use cases, however that would still require two modules per component, so it's still desirable to way to avoid needing this for simple cases.

We can do this by adding support to the canonical ABI for modules that import their linear memories rather than export them. In wasm-ld, the --import-memory flag creates a module that imports its memory. If we use that, and add canonical-abi support for this mode, this should allow us to avoid the import cycles and the extra modules needed to break them.

@lukewagner
Copy link
Member

(Sorry for the slow reply; returning from holidays) Technically, the Canonical ABI is agnostic to whether core modules import or export linear memory. E.g., today, you can write:

(component
  (import "./libc.wasm" (core module $Libc
    (export "memory" (memory 1))
  ))
  (core module $M
    (import "libc" "memory" (memory 1))
  )
  (core instance $libc (instantiate $Libc))
  (core instance $m (instantiate $M (with "libc" (instance $libc))))
)

However, the convention of importing-vs-exporting memory does get baked into the not-yet-finalized core module build target where indeed the convention is currently that memory is exported.

So is your suggestion to change the core module build target to import instead of export memory? If so, that sounds reasonable to me; I think it'd be great to encourage movement words common shared libc/allocator modules like you're suggesting. I do have a vague recollection that it might require some toolchain work to change, though. #378 is still open to discuss this, so maybe comment there?

@sunfishcode
Copy link
Member Author

sunfishcode commented Jan 15, 2025

I had pictured it as a canonical built-in, something like this:

(component
  (core memory $memory (canon memory.define 1 1))
  (core module $M
    (import "memory-instance" "memory" (memory 1 1))
  )
  (core instance $m (instantiate $M (with "memory-instance" (instance (export "memory" (memory $memory))))))
)

@lukewagner
Copy link
Member

Historically the challenge with that idea has been that you also need a realloc function to go along with that memory. But ooh, with lazy lowering, realloc goes away! So indeed, yeah, what you're saying would be an easy and effective addition post-lazy-lowering.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants