Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Uint8Array input/output in @swc/wasm-typescript #9851

Open
joyeecheung opened this issue Jan 7, 2025 · 2 comments
Open

Support Uint8Array input/output in @swc/wasm-typescript #9851

joyeecheung opened this issue Jan 7, 2025 · 2 comments
Assignees
Milestone

Comments

@joyeecheung
Copy link

joyeecheung commented Jan 7, 2025

Describe the feature

Currently, @swc/wasm-typescript takes a JS string as input and returns a JS string as output. This can incur a lot of unnecessary overhead in string transcoding if the user needs them in UTF-8 encoded data. It would nice if swc accepts UTF8-encoded data as input and return UTF8-encoded data as output, at least stored in Uint8Arrays.

In particular this would be useful for Node.js, which typically reads the source code as UTF-8 encoded buffers from disk first, and when integrating TypeScript into the compile cache, it needs to write the transpiled code as UTF-8 encoded data to disk as well.

Babel plugin or link to the feature description

No response

Additional context

And as far as I can tell, swc needs to internally convert these strings into UTF8-encoded data before performing transpilation. So something like this is very likely to happen:

  1. Users read the TypeScript code from disk, which is typically stored in UTF-8, so the UTF8 input data is already first read into a Uint8Array (or a Node.js Buffer, which is a subclass of Uint8Array)
  2. Since swc needs a string input, users have to convert that UTF-8 content into a JS string. In the case of strings in V8, it needs to be transcoded into either Latin-1 (if it fits) or UTF-16 in the underlying storage.
  3. AFAICT swc needs to convert that JS string into UTF-8 encoded data in a Uint8Array and pass it into the rust layer to be converted into a UTF-8 rust string, that code is generated by wasm-bindgen using a TextEncoder.
  4. After transpilation is done the result is converted again from a UTF-8 rust string into a Uint8Array and then into a JS string. That is done by wasm-bindgen-generated code using a TextDecoder.
  5. The user needs to convert that JS string returned by swc into UTF-8 data in a Uint8Array again before writing it to disk to store the result in UTF-8.

If swc just supports UTF8 input/output in Uint8Array, 2-5 can be skipped in the case where users don't need the intput/output as JS strings for additional manipulation. Even if they do, they can skip 3-4 by keeping the Uint8Arrays with UTF8 data on the side.

@kdy1 kdy1 added this to the Planned milestone Jan 8, 2025
@kdy1 kdy1 self-assigned this Jan 14, 2025
@kdy1
Copy link
Member

kdy1 commented Jan 15, 2025

I added input support with #9879, but I'm not sure how should I determine the output type. Should I add an option?

@joyeecheung
Copy link
Author

Nice, thanks! I think having an output format option would work the best in case anyone else needs to feed a string and get a buffer / the other way around.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

2 participants