Tests which are expected to fail / known broken #224

eli-schwartz · 2024-12-10T05:11:17Z

It appears that there isn't currently a provision for this. Various testing frameworks support this status (off the top of my head this includes at least GNU Automake, the Test Anything Protocol, Pytest, and the Meson Build System).

I'll quote GNU Automake's manual page on "Generalities about testing" because IMHO the GNU project is, as usual, very expressive about the reasoning behind concepts. Paragraph 2 is key here:

https://www.gnu.org/software/automake/manual/html_node/Generalities-about-Testing.html

Sometimes, tests can rely on non-portable tools or prerequisites, or simply make no sense on a given system (for example, a test checking a Windows-specific feature makes no sense on a GNU/Linux system). In this case, accordingly to the definition above, the tests can neither be considered passed nor failed; instead, they are skipped, that is, they are not run, or their result is in any case ignored for what concerns the count of failures and successes. Skips are usually explicitly reported though, so that the user will be aware that not all of the testsuite has been run.

It’s not uncommon, especially during early development stages, that some tests fail for known reasons, and that the developer doesn’t want to tackle these failures immediately (this is especially true when the failing tests deal with corner cases). In this situation, the better policy is to declare that each of those failures is an expected failure (or xfail). In case a test that is expected to fail ends up passing instead, many testing environments will flag the result as a special kind of failure called unexpected pass (or xpass).

Many testing environments and frameworks distinguish between test failures and hard errors. As we’ve seen, a test failure happens when some invariant or expected behavior of the software under test is not met. A hard error happens when e.g., the set-up of a test case scenario fails, or when some other unexpected or highly undesirable condition is encountered (for example, the program under test experiences a segmentation fault).

They are usually called XFAIL and XPASS (though meson calls them EXPECTEDFAIL and EXPECTEDPASS).

Both should be reported as distinct from regular passes/failures.

marcphilipp · 2024-12-11T08:58:31Z

Spock is a Java ecosystem testing framework maintained by @leonard84. I has a @PendingFeature annotation that you can put on tests ("features" in their lingo):

The use case is to annotate tests that can not yet run but should already be committed. The main difference to @Ignore is that the test are executed, but test failures are ignored. If the test passes without an error, then it will be reported as failure since the @PendingFeature annotation should be removed. This way the tests will become part of the normal tests instead of being ignored forever.

Instead of using special xfail and xpass statuses it reports tests as aborted and failed, respectively.

@leonard84 What's your opinion about using special status for these cases?

eli-schwartz · 2024-12-11T14:12:15Z

What does it mean for a test to be aborted? There are two mesonbuild statuses which might be possible to interpret as aborted:

what GNU calls a "hard error". Could be caused by e.g. broken CFLAGS, a bug in the runner itself, a pathological bug in your library... much more dangerous to ignore than a failed test
a timeout, which indicates that either the test needs to carefully mark how long it runs for, or else that it is hanging forever (meson by default doesn't allow tests to run for longer than 30 seconds unless annotated, to prevent tests from hanging forever). As a rule, it's not worth looking at the output logs for a timeout, you need a very different investigative approach. (And it might not be a problem at all -- it could be that you're just running on an OS/CPU combo that is really slow. Maybe you're building MIPS under qemu emulation. On the other hand, maybe that new feature is deadlocking.) Meson quite literally aborted that test from the outside -- it didn't have the opportunity to run to completion.

Is "aborted" intended as a kind of catchall?

If Spock is treating expected failures that are ignored, by mapping them to "aborted", then that indicates that aborted tests are not considered errors at all, which means my assumption about what to use "aborted" for was completely off base!

marcphilipp · 2024-12-11T15:48:44Z

What does it mean for a test to be aborted?

We should definitely define these statuses clearly and explicitly!

In JUnit, "aborted" is similar to "skipped". While "skipped" means there was a condition that prevented the test from even being started (could be as simple as explicitly ignoring a test, checking for an env var, etc.), "aborted" means the test was started but during execution the test code decided that it couldn't complete, e.g. because of a more complicated condition that could only be evaluated midway.

eli-schwartz · 2024-12-11T17:27:36Z

That's an interesting definition...

Test frameworks such as python unittest/pytest allow marking a test as SKIP either by decorating the test definition with some condition evaluated during collection, such as an environment variable, or by raising a skipTest() halfway through the test. No distinction is drawn between the two -- the logic is probably that "we figured there's no semantic difference regarding which stage it detected the need to skip".

Test frameworks such as GNU automake or Meson don't operate by testing functions at all -- instead they collect and track test programs, and expect a test program to signal whether it gets skipped. I guess those always count as "ABORTED to use your terminology, since there's no actual way to prevent a test from even starting, and the test protocol doesn't have a way to signal whether the test program decided to signal a SKIP based on checking an env var in main() or based on more complicated conditions.

...

With regard specifically to mapping xfail -> aborted. Reporters which use the term "xfail" traditionally consider this important to draw attention to because xfail tests indicate missing functionality which isn't a regression from the previous state of the code. When running the tests to see what shape the project is in, you want to be able to see on the dashboard that they do need fixing and maybe you should go ahead and fix them if you can / have time, you just don't want to e.g. gate CI on those failures. I wouldn't want to have them be intermingled together with tests that signaled halfway through their execution that the feature being tested isn't available on the current system and therefore cannot be unit-tested.

So I would definitely recommend adding new special statuses for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tests which are expected to fail / known broken #224

Tests which are expected to fail / known broken #224

eli-schwartz commented Dec 10, 2024

marcphilipp commented Dec 11, 2024

eli-schwartz commented Dec 11, 2024

marcphilipp commented Dec 11, 2024

eli-schwartz commented Dec 11, 2024

Tests which are expected to fail / known broken #224

Tests which are expected to fail / known broken #224

Comments

eli-schwartz commented Dec 10, 2024

marcphilipp commented Dec 11, 2024

eli-schwartz commented Dec 11, 2024

marcphilipp commented Dec 11, 2024

eli-schwartz commented Dec 11, 2024