You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, @swc/wasm-typescript takes a JS string as input and returns a JS string as output. This can incur a lot of unnecessary overhead in string transcoding if the user needs them in UTF-8 encoded data. It would nice if swc accepts UTF8-encoded data as input and return UTF8-encoded data as output, at least stored in Uint8Arrays.
In particular this would be useful for Node.js, which typically reads the source code as UTF-8 encoded buffers from disk first, and when integrating TypeScript into the compile cache, it needs to write the transpiled code as UTF-8 encoded data to disk as well.
Babel plugin or link to the feature description
No response
Additional context
And as far as I can tell, swc needs to internally convert these strings into UTF8-encoded data before performing transpilation. So something like this is very likely to happen:
Users read the TypeScript code from disk, which is typically stored in UTF-8, so the UTF8 input data is already first read into a Uint8Array (or a Node.js Buffer, which is a subclass of Uint8Array)
Since swc needs a string input, users have to convert that UTF-8 content into a JS string. In the case of strings in V8, it needs to be transcoded into either Latin-1 (if it fits) or UTF-16 in the underlying storage.
AFAICT swc needs to convert that JS string into UTF-8 encoded data in a Uint8Array and pass it into the rust layer to be converted into a UTF-8 rust string, that code is generated by wasm-bindgen using a TextEncoder.
After transpilation is done the result is converted again from a UTF-8 rust string into a Uint8Array and then into a JS string. That is done by wasm-bindgen-generated code using a TextDecoder.
The user needs to convert that JS string returned by swc into UTF-8 data in a Uint8Array again before writing it to disk to store the result in UTF-8.
If swc just supports UTF8 input/output in Uint8Array, 2-5 can be skipped in the case where users don't need the intput/output as JS strings for additional manipulation. Even if they do, they can skip 3-4 by keeping the Uint8Arrays with UTF8 data on the side.
The text was updated successfully, but these errors were encountered:
Nice, thanks! I think having an output format option would work the best in case anyone else needs to feed a string and get a buffer / the other way around.
Describe the feature
Currently,
@swc/wasm-typescript
takes a JS string as input and returns a JS string as output. This can incur a lot of unnecessary overhead in string transcoding if the user needs them in UTF-8 encoded data. It would nice if swc accepts UTF8-encoded data as input and return UTF8-encoded data as output, at least stored in Uint8Arrays.In particular this would be useful for Node.js, which typically reads the source code as UTF-8 encoded buffers from disk first, and when integrating TypeScript into the compile cache, it needs to write the transpiled code as UTF-8 encoded data to disk as well.
Babel plugin or link to the feature description
No response
Additional context
And as far as I can tell, swc needs to internally convert these strings into UTF8-encoded data before performing transpilation. So something like this is very likely to happen:
If swc just supports UTF8 input/output in Uint8Array, 2-5 can be skipped in the case where users don't need the intput/output as JS strings for additional manipulation. Even if they do, they can skip 3-4 by keeping the Uint8Arrays with UTF8 data on the side.
The text was updated successfully, but these errors were encountered: