Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the compilation cache when running typescript files through --experimental-transform-types #54741

Open
ShenHongFei opened this issue Sep 3, 2024 · 8 comments · May be fixed by #56629
Open
Labels
feature request Issues that request new features to be added to Node.js. loaders Issues and PRs related to ES module loaders strip-types Issues or PRs related to strip-types support

Comments

@ShenHongFei
Copy link
Contributor

What is the problem this feature will solve?

The flags --experimental-strip-types and --experimental-transform-types enable Node.js to run almost all TypeScript files. #54283

This feature of running .ts files directly is great. I recently removed the step of compiling .ts files to .js and started the project directly through node entry.ts, and then directly imported other .ts modules.

However, in a large project with many files, compiling .ts files takes up a lot of startup time (about 200ms for all files using .js, and about 700ms for .ts). If there is a compilation cache, skipping the compilation of the same .ts file and directly using the .js compilation result, it will be very efficient.

What is the feature you are proposing to solve the problem?

After I got this idea, I tried to simply modify lib/internal/modules/esm/translators.js and add local file cache. Now the speed of running .ts file for the second time is exactly the same as .js.

I hope someone can combine .ts compilation cache with the current .js compilation cache in node.js.

diff --git a/lib/internal/modules/esm/translators.js b/lib/internal/modules/esm/translators.js
index b1e7b860..8eb172b1 100644
--- a/lib/internal/modules/esm/translators.js
+++ b/lib/internal/modules/esm/translators.js
@@ -24,8 +24,9 @@ const {
 
 const { BuiltinModule } = require('internal/bootstrap/realm');
 const assert = require('internal/assert');
-const { readFileSync } = require('fs');
-const { dirname, extname, isAbsolute } = require('path');
+const { readFileSync, writeFileSync } = require('fs');
+const { dirname, extname, basename, isAbsolute } = require('path');
+const crypto = require('crypto')
 const {
   assertBufferSource,
   loadBuiltinModule,
@@ -484,11 +485,38 @@ translators.set('commonjs-typescript', function(url, source) {
   return FunctionPrototypeCall(translators.get('commonjs'), this, url, code, false);
 });
 
+
+// --- patch: Compile .ts esm files with typescirpt and cache
+const filepath_cache = process.env.NODE_COMPILE_CACHE + '/'
+
 // Strategy for loading an esm TypeScript module
 translators.set('module-typescript', function(url, source) {
-  emitExperimentalWarning('Type Stripping');
-  assertBufferSource(source, false, 'load');
-  const code = stripTypeScriptTypes(stringify(source), url);
-  debug(`Translating TypeScript ${url}`);
-  return FunctionPrototypeCall(translators.get('module'), this, url, code, false);
-});
+    emitExperimentalWarning('Type Stripping');
+    assertBufferSource(source, false, 'load')
+    
+    const str_source = stringify(source)
+    
+    const [
+      code = stripTypeScriptTypes(str_source, url),
+      filepath_js
+    ] = (() => {
+        const filepath_js = filepath_cache +
+            url.slice('file:///'.length).replaceAll(':', '_').replaceAll('/', '_').slice(0, -2) +
+            crypto.hash('sha256', str_source).slice(0, 8) +
+            '.js'
+        
+        try {
+            return [readFileSync(filepath_js, 'utf-8')]
+        } catch { }
+        
+        return [undefined, filepath_js]
+    })()
+    
+    if (filepath_js)
+        try {
+            writeFileSync(filepath_js, code)
+        } catch { }
+    
+    return FunctionPrototypeCall(translators.get('module'), this, url, code, false)
+})
+

@nodejs/typescript
@marco-ippolito @joyeecheung

What alternatives have you considered?

No response

@ShenHongFei ShenHongFei added the feature request Issues that request new features to be added to Node.js. label Sep 3, 2024
@github-project-automation github-project-automation bot moved this to Awaiting Triage in Node.js feature requests Sep 3, 2024
@marco-ippolito
Copy link
Member

this is interesting, we definitely should cache. I think we already have a cache in place for js files but I'm not 100%

@avivkeller avivkeller added loaders Issues and PRs related to ES module loaders strip-types Issues or PRs related to strip-types support labels Sep 3, 2024
@avivkeller avivkeller moved this from Awaiting Triage to Triaged in Node.js feature requests Sep 3, 2024
@joyeecheung
Copy link
Member

joyeecheung commented Sep 3, 2024

I think the idea is great though the implementation definitely needs more polishing:

  1. It will need to work with the compile cache directory structure (documented here
    // Directory structure:
    )
  2. There needs to be some checksum in the file to defend against changes to the source code / cache (this is what we have for JS code caches
    // Layout of a cache file:
    )
  3. I am not sure whether it should be a two tier TS -> JS + JS -> code cache, or just a TS -> code cache - I suppose if we expect people to commonly use custom loaders to do anything before or after TS -> JS transformation, then it needs to be two tier, because the TS -> JS phase is exposed to loaders.
  4. Also, the checksum should probably include typescript-specific flags (e.g. whether transform is done). Switching from type-stripping to transform-types or vice versa should result in cache misses.

@RobsonTrasel
Copy link

Wow, the proposed solution is pretty cool

@marco-ippolito
Copy link
Member

I started playing with it, and I realized it probably doesnt make sense. Users should transpile to js with a compilation step and then cache. There is no point in persisting a transpiled ts on disk since its what tsc does. You can already cache the js compiled module.

@ShenHongFei
Copy link
Contributor Author

The main impact of having a compilation cache is the running speed. The cache can save processor resources and exchange space for time. After all, there is no obvious disadvantage in adding a cache.

TypeScript is an interpreted language like JavaScript, not a compiled language. Compiling all .ts files in advance and then running is not very user-friendly. It always feels like two steps in spirit. Compilation and type checking should be a separate optional process rather than a must-have.

In a node.js project, I hope to write import { xxx } from 'b.ts' in a.ts to complete the import, so the following four options are required in tsconfig.json

{
    "compilerOptions": {
        "module": "ESNext",
        "moduleResolution": "Bundler",
        "noEmit": true,
        "allowImportingTsExtensions": true
    }
}

Because "allowImportingTsExtensions": true requires "noEmit": true,, I can't generate any files with tsc, and Typescript will not automatically modify the suffix of the imported file.

Now many other javascript runtimes, such as deno, bun, support running .ts files directly instead of compiling first, and the performance is very good. I don't know if they cache.

@marco-ippolito
Copy link
Member

With TypeScript 5.7 there is a emit flag compatible with strip-types https://devblogs.microsoft.com/typescript/announcing-typescript-5-7-beta/ so you can now transpile.

@joyeecheung
Copy link
Member

joyeecheung commented Jan 6, 2025

Added to #52696 - as mentioned in #54741 (comment) this should tie into the existing compilation cache storage, so that it works with the module.enableCompileCache() JS APIs etc, and preferably as a two-tier cache to minimize impact of loader hooks - additionally to get the most of out it, I think it's worth addressing the other TODO in #52696 (comment) , that is, to pass UTF-8 encoded content in buffers directly instead of passing strings around, as it's somewhat silly to go through the UTF8 transcoding repeatedly through the cache layers/swc (when I was profiling loading of the tsc itself, UTF8 transcoding accounted for 18% of the overhead after compile cache is turned on, for example, so this can be non-trivial).

Can look into it in the coming weeks, as I was also finishing the multi-threaded writes of the cache. I believe technically we could also make ESM-TypeScript transpilation multi-threaded, too, though that would likely require using swc natively instead of using wasm or otherwise the overhead of coordination in multi-threaded JS may cancel out the benefit...

@joyeecheung
Copy link
Member

joyeecheung commented Jan 7, 2025

Started with a prototype here. Technically, this just needs one additional layer of TS -> JS cache, the existing JS -> bytecode cache is applied automatically afterwards.

Numbers already look promising (I took ./deps/v8/tools/turbolizer/node_modules/typescript/lib/lib.dom.d.ts which is full of types for the benchmark) even though there are still plenty of TODOs to skip redundant UTF-8 transcoding to get the most out of it - in particular, it would be great if swc supports input/output from/to a Uint8Array containing UTF-8 encoded data. IIUC it works on UTF-8 input in the rust layer already. For now it is rather silly that this still needs to happen

  1. We convert the input V8 string (Latin 1 or UTF16) into UTF8 to compute the hash, then discard the UTF8 input
  2. swc needs to internally convert the input V8 string into UTF8 again to transpile it
  3. swc needs to convert the UTF8 output into a V8 string output to give it back to us
  4. We convert that V8 string output into UTF-8 again for saving it to disk as UTF-8.

If swc just supports UTF8 -> UTF8 buffers, we can save 2-3.

❯ export NODE_DEBUG_NATIVE=
❯ export NODE_COMPILE_CACHE=
❯ hyperfine "out/Release/node ./lib.dom.d.ts" --warmup=5
Benchmark 1: out/Release/node ./lib.dom.d.ts
  Time (mean ± σ):     143.2 ms ±  25.4 ms    [User: 281.5 ms, System: 15.6 ms]
  Range (min … max):   133.2 ms … 252.4 ms    21 runs

❯ export NODE_COMPILE_CACHE=.tmp/.cache
❯ hyperfine "out/Release/node ./lib.dom.d.js" "out/Release/node ./lib.dom.d.ts" "./node_main ./lib.dom.d.ts"  --warmup 5
Benchmark 1: out/Release/node ./lib.dom.d.js
  Time (mean ± σ):      21.5 ms ±  10.8 ms    [User: 15.9 ms, System: 3.7 ms]
  Range (min … max):    19.7 ms … 134.9 ms    118 runs

Benchmark 2: out/Release/node ./lib.dom.d.ts
  Time (mean ± σ):      26.2 ms ±  19.0 ms    [User: 18.3 ms, System: 4.3 ms]
  Range (min … max):    22.4 ms … 165.5 ms    105 runs

Benchmark 3: ./node_main ./lib.dom.d.ts
  Time (mean ± σ):     149.3 ms ±  46.2 ms    [User: 281.3 ms, System: 16.2 ms]
  Range (min … max):   131.2 ms … 291.3 ms    21 runs

Summary
  'out/Release/node ./lib.dom.d.js' ran
    1.22 ± 1.08 times faster than 'out/Release/node ./lib.dom.d.ts'
    6.95 ± 4.11 times faster than './node_main ./lib.dom.d.ts'

Used the main branch for comparison to rule out the time saved for JS compilation, although after type stripping there isn't much left in a .d.ts file anyway..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Issues that request new features to be added to Node.js. loaders Issues and PRs related to ES module loaders strip-types Issues or PRs related to strip-types support
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants