Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_runner: add bail out #56490

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

pmarchini
Copy link
Member

Catching up with the last attempt (#48919), this is another try at introducing the bailout feature.
I'm opening this PR as a draft to discuss the implementation and because refactoring may be needed if this approach is well-received by the community.

Note: In some tests, I had to enforce a concurrency=1 setting because testing the bailout feature across multiple files concurrently proved to be extremely flaky.

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/test_runner

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 6, 2025
@pmarchini pmarchini force-pushed the test_runner/bail-out branch from 89f64db to 55874e8 Compare January 6, 2025 19:18
@cjihrig
Copy link
Contributor

cjihrig commented Jan 6, 2025

In some tests, I had to enforce a concurrency=1 setting because testing the bailout feature across multiple files concurrently proved to be extremely flaky.

Can you explain more about what was flaky? I'm guessing you mean the tests were at different points of execution when they received the bail out signal. I think the best way to work around this is to use test fixtures that never finish.

@pmarchini
Copy link
Member Author

Can you explain more about what was flaky? I'm guessing you mean the tests were at different points of execution when they received the bail out signal. I think the best way to work around this is to use test fixtures that never finish.

Hey @cjihrig, you're guessing right!

Your proposed solution sounds good.

I think we should add a test as follows:

  • First file test: A test that fails after a long timeout (maybe 5-10 seconds) to allow other file test processes to be spawned correctly.
  • Second file test: A test with an infinite loop.

WDYT?

lib/internal/test_runner/runner.js Outdated Show resolved Hide resolved
doc/api/test.md Outdated Show resolved Hide resolved
doc/api/test.md Outdated
By enabling this flag, the test runner will exit the test suite early
when it encounters the first failing test, preventing
the execution of subsequent tests.
Already running tests will be canceled, and no further tests will be started.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we need to get into this small of a detail, but technically there will be a window of time where new tests may be started before being cancelled. Maybe we can say something like "The test runner will cancel all remaining tests."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

doc/api/test.md Outdated Show resolved Hide resolved
doc/api/test.md Outdated
@@ -1483,6 +1512,11 @@ changes:
does not have a name.
* `options` {Object} Configuration options for the test. The following
properties are supported:
* `bail` {boolean}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be an option to test(). Should it be part of run() instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my bad, the bail option it's part of run. I'll update the doc ASAP !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

test/fixtures/test-runner/output/bail-spec.js Outdated Show resolved Hide resolved
test/parallel/test-runner-run.mjs Outdated Show resolved Hide resolved
it('should only allow a boolean in options.bail', async () => {
[Symbol(), {}, [], () => { }, 0, 1, 0n, 1n, '', '1', Promise.resolve([])]
.forEach((bail) => assert.throws(() => run({ bail }), {
code: 'ERR_INVALID_ARG_TYPE'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, are we sure that these errors actually correspond to the use of bail? For example, it's possible that ERR_INVALID_ARG_TYPE or ERR_INVALID_ARG_VALUE are thrown by run() completely unrelated to the bail option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've written the test following a red-green cycle.
I'll take another look and I'll add a comment here 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option is to update the assertion to check the code and the message.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, we should probably do the same for the other "validation" tests as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

test/parallel/test-runner-run.mjs Outdated Show resolved Hide resolved

// TODO(pmarchini): Bailout is not supported in watch mode yet but it should be.
// We should enable this test once it is supported.
it.todo('should handle the bail option with watch mode');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be documented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a note into the documentation

@cjihrig
Copy link
Contributor

cjihrig commented Jan 6, 2025

I think we should add a test as follows:

  • First file test: A test that fails after a long timeout (maybe 5-10 seconds) to allow other file test processes to be spawned correctly.
  • Second file test: A test with an infinite loop.

There are all sorts of annoying edge cases to account for here, and it might be worth doing a survey of how the tap, mocha, and vitest runners handle bailing out when things are running in parallel. It's much more straightforward when only one thing is running. But, for example, if the very first test in the first process fails, do we bother spawning the other child processes at all? Or, in a bail out situation, how important is it to have an accurate summary at the end of the test run with correct counts for total tests, cancelled tests, etc.

@pmarchini pmarchini force-pushed the test_runner/bail-out branch from 55874e8 to f1c269c Compare January 10, 2025 17:36
@pmarchini
Copy link
Member Author

pmarchini commented Jan 12, 2025

@cjihrig I'm still checking different runners and it seems that mostly the behaviour is "non-standard".
Vitest returns, even after the bail out, the list of all the skipped test files.

Mocha stops the execution returning a partial result without reporting the full list of cancelled tests.

      const result = await runMochaAsync('options/parallel/test-*', [
        '--parallel',
        '--bail'
      ]);
      // we don't know _exactly_ how many tests will be skipped here
      // due to the --bail, but the number of tests completed should be
      // less than the total, which is 5.
      return expect(
        result.passing + result.pending + result.failing,
        'to be less than',
        5
      );

Checking tests like this one I have the impression that the testing of the feature itself follows a "best effort" approach in more than one tool.

While I'm still checking other examples I think that the most common use case for the bailout is to stop as soon as possible the execution in a CI/automation env.
In this scenario, IMHO, all that matters is the "fail fast" and "fail exit".
I'm not even sure that it would make any sense at all to provide a report after a bail and, if a report is being provided, then I prefer the idea of having an output that gives an exact trace of the actual run (so partial).
WDYT?

Regarding the tests I was thinking about:

  1. sequential with single file
  2. sequential isolation none
  3. sequential isolation none with more than 1 test file
  4. parallel with 2 test files ( 1 with some "loading-time" and 1 with infinite waiting loop )
  5. parallel with 2+x test files with concurrency fixed to 2 -> this in order to cover the behaviour of one test file that should not even start its run
  6. order of the hooks while in bail out -> to ensure the hooks are being run as supposed.
  7. order of the hooks while in bail out -> multi file

Do you have any suggestions / other behaviours you think we should ensure?

@pmarchini pmarchini marked this pull request as ready for review January 12, 2025 21:51
Copy link

codecov bot commented Jan 12, 2025

Codecov Report

Attention: Patch coverage is 97.93814% with 2 lines in your changes missing coverage. Please review.

Project coverage is 89.20%. Comparing base (062ae6f) to head (db86232).
Report is 73 commits behind head on main.

Files with missing lines Patch % Lines
lib/internal/test_runner/runner.js 95.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #56490      +/-   ##
==========================================
+ Coverage   89.12%   89.20%   +0.08%     
==========================================
  Files         662      662              
  Lines      191555   191906     +351     
  Branches    36859    36948      +89     
==========================================
+ Hits       170722   171190     +468     
+ Misses      13694    13557     -137     
- Partials     7139     7159      +20     
Files with missing lines Coverage Δ
lib/internal/test_runner/harness.js 92.83% <100.00%> (+0.16%) ⬆️
lib/internal/test_runner/reporter/spec.js 96.29% <100.00%> (+0.10%) ⬆️
lib/internal/test_runner/reporter/tap.js 95.43% <100.00%> (+0.06%) ⬆️
lib/internal/test_runner/reporter/utils.js 96.84% <100.00%> (+0.13%) ⬆️
lib/internal/test_runner/test.js 96.98% <100.00%> (+0.05%) ⬆️
lib/internal/test_runner/tests_stream.js 92.13% <100.00%> (+0.41%) ⬆️
lib/internal/test_runner/utils.js 58.79% <100.00%> (+2.58%) ⬆️
src/node_options.cc 87.96% <100.00%> (+0.03%) ⬆️
src/node_options.h 98.33% <ø> (+0.01%) ⬆️
lib/internal/test_runner/runner.js 90.30% <95.00%> (+0.79%) ⬆️

... and 90 files with indirect coverage changes

Copy link
Contributor

@cjihrig cjihrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments. I have some concerns:

  • The patch is pretty invasive for a feature that is not part of the default behavior. I'm not sure that it needs to be so invasive.
  • There seems to be a good bit of work being done when it is not necessary. Have you been able to do any performance comparisons with these changes.
  • The tests (thank you for adding so many tests!) seem to reliant on timers, which usually works fine locally, but is a recipe for flakiness in the CI.

@@ -57,6 +57,7 @@ class SpecReporter extends Transform {
#handleEvent({ type, data }) {
switch (type) {
case 'test:fail':
if (data.details?.error?.failureType === kTestBailedOut) break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of needing to check this for every failure, even if bail out mode is not enabled. Couldn't we break out in the test:bail event?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, but in that case, we would lose the "report' section.

Couldn't we break out in the test:bail event?

Specifically, how would you do that? Would you finalise the reporter, or propagate the abort up to the test runner root?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmarchini we need to also account for the fact these changes impact custom reporters

@@ -33,6 +33,7 @@ async function * tapReporter(source) {
for await (const { type, data } of source) {
switch (type) {
case 'test:fail': {
if (data.details?.error?.failureType === lazyLoadTest().kTestBailedOut) break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here.

Also, why is bailing out only supported in the spec and tap reporters? What about the dot reporter for example?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't implemented all the reporters yet, as I'm still not convinced by the bail implementation itself.
I'll fix this before landing if we're able to reach an agreement on the implementation 🚀

@@ -663,6 +688,15 @@ function run(options = kEmptyObject) {

validateStringArray(argv, 'options.argv');
validateStringArray(execArgv, 'options.execArgv');
validateBoolean(bail, 'options.bail');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should probably follow the same pattern as forceExit does around line 600 so that we don't need to perform all of this extra validation for no reason.

'test did not finish before its parent and was cancelled',
kCancelledByParent,
),
const cancelledError = this.root.bailed ? new ERR_TEST_FAILURE(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems wasteful to create this error unconditionally even though it might not be used.

@@ -763,6 +770,13 @@ class Test extends AsyncResource {

this.passed = false;
this.error = err;

if (this.bail && !this.root.bailed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of adding another way to cancel tests when we already have logic for that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling root.postRun was the only way I found to immediately propagate the "stop" to the entire test tree.
I'm currently trying to achieve the same using an AbortController, but I haven't found a viable solution yet.


it('passing test with delay', async (t) => {
await new Promise((resolve) => setTimeout(resolve, 200));
t.assert.ok(true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed?

describe('first parallel file', () => {

it('passing test with delay', async (t) => {
await new Promise((resolve) => setTimeout(resolve, 200));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have the timers/promises module to avoid needing this pattern.

I also have concerns that if the test relies on this 200ms timeout, there is a good chance that it will become flaky in the CI. If there needs to be coordination between the files, it might be worth creating a file or something instead of relying on timers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the advice, I'll try to follow your suggested approach! 🚀


describe('second parallel file', () => {
it('slow test that should be cancelled', async () => {
await new Promise((resolve) => setTimeout(resolve, 5000));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about using timers/promises. You can also just return the promise and the test runner will await it.

it('infinite loop that should be cancelled', async () => {
// This should be cancelled by the bail
while (true) {
await new Promise((resolve) => setTimeout(resolve, 100));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about timers/promises. I'll stop pointing it out, but it exists in some other places in this PR.

@@ -0,0 +1,174 @@
'use strict';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be a lot of overlap with the changes to test/parallel/test-runner-run.mjs. That other file is getting pretty large, and --test already goes through run(), so I'd say eliminate any duplication that isn't needed.

@pmarchini
Copy link
Member Author

Hey @cjihrig, thanks as always for your review.
I totally understand your concerns.

Step by step:

The patch is pretty invasive for a feature that is not part of the default behavior. I'm not sure that it needs to be so invasive.

Do you have any suggestions on how to reduce the footprint while better integrating it into the current implementation? I'm gaining more confidence in the codebase as I work on it, but I’m sure I might still be missing something important.

There seems to be a good bit of work being done when it is not necessary. Have you been able to do any performance comparisons with these changes?

Not yet, but I’ll make sure to do that as soon as possible!

The tests (thank you for adding so many tests!) seem too reliant on timers, which usually works fine locally, but is a recipe for flakiness in the CI.

I agree. I’ll try to use your suggested approach to coordinate the files without relying on timers or other flaky solutions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants