-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] rv-virt/citest: test_hello or test_pipe failed #14808
Comments
The Timeout Values are configured to One Minute or longer for some Python Tests. What if we reduce the Timeout Values? https://github.com/search?q=repo%3Aapache%2Fnuttx+timeout%3D+language%3APython+path%3A%2F%5Etools%5C%2Fci%5C%2Ftestrun%5C%2F%2F&type=code Update: Nope, doesn't work: https://github.com/lupyuen/nuttx-build-farm/blob/main/run-job-macos.sh#L107-L131 Somehow the Timeout Value is hard-coded inside |
For now we patched the NuttX Mirror Repo: Kill the CI Test if it exceeds 2 hours. Also for Ubuntu Build Farm and macOS Build Farm. |
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub): - apache#14808 - apache#14680 This is a problem because: - It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF. - Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run. For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub): - #14808 - #14680 This is a problem because: - It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF. - Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run. For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub): - apache#14808 - apache#14680 This is a problem because: - It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF. - Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run. For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
@xiaoxiang781216 Yep CI Test is still failing, according to NuttX Dashboard: https://github.com/NuttX/nuttx/actions/runs/12156309894/job/33899821697#step:7:88
|
NuttX ## Start Docker Container for NuttX
sudo docker run \
-it \
ghcr.io/apache/nuttx/apache-nuttx-ci-linux:latest \
/bin/bash
## Inside Docker:
## We compile rv-virt:citest
cd
git clone https://github.com/apache/nuttx
git clone https://github.com/apache/nuttx-apps apps
pushd nuttx ; echo NuttX Source: https://github.com/apache/nuttx/tree/$(git rev-parse HEAD) ; popd
pushd apps ; echo NuttX Apps: https://github.com/apache/nuttx-apps/tree/$(git rev-parse HEAD) ; popd
cd nuttx
tools/configure.sh rv-virt:citest
make -j
qemu-system-riscv32 \
-M virt \
-bios ./nuttx \
-nographic
NuttShell (NSH) NuttX-12.7.0
nsh> uname -a
NuttX 12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v rv-virt
nsh> ps
PID GROUP PRI POLICY TYPE NPX STATE EVENT SIGMASK STACK USED FILLED COMMAND
0 0 0 FIFO Kthread - Ready 0000000000000000 0001952 0000908 46.5% Idle_Task
1 0 224 RR Kthread - Waiting Semaphore 0000000000000000 0001904 0000508 26.6% hpwork 0x8014b1e4 0x8014b210
2 0 100 RR Kthread - Waiting Semaphore 0000000000000000 0001896 0000508 26.7% lpwork 0x8014b1a0 0x8014b1cc
riscv_exception: EXCEPTION: Load access fault. MCAUSE: 00000005, EPC: 80008bfe, MTVAL: 01473e00
riscv_exception: PANIC!!! Exception = 00000005
dump_assert_info: Current Version: NuttX 12.7.0 5607eece84 Dec 11 2024 07:05:48 risc-v
dump_assert_info: Assertion failed panic: at file: common/riscv_exception.c:131 task: nsh_main process: nsh_main 0x8000a806
up_dump_register: EPC: 80008bfe |
Apparently, #15075 decreased the size of the available stack size and this makes |
As I have said earlier at #15165, it didn't fix the problem of the failing CI, but fixed the problem related to I continued to investigate it and I was able to check that the commit that broke the CI was 656883f. I couldn't investigate it further, but I think this gives us a start point. |
I will double-check. You can verify it with:
(and then substitute |
Thank you so much Tiago! I created a new Bug Report for the Load Access Fault at ltp_interfaces_pthread_barrierattr_init_2_1: |
CI Test will sometimes run for 6 hours (before getting auto-terminated by GitHub): - apache#14808 - apache#14680 This is a problem because: - It will increase our usage of GitHub Runners. Which may overrun the [GitHub Actions Budget](https://infra.apache.org/github-actions-policy.html) allocated by ASF. - Suppose right after CI Test there's another build. If CI Test runs for all 6 hours, then the build after CI Test will never run. For this PR: We assume that Every CI Job (e.g. risc-v-05) will complete normally within 2 hours. If any CI Job exceeds 2 hours: This PR will kill the CI Test Process `pytest` and allow the next build to run.
Description / Steps to reproduce the issue
Since yesterday:
rv-virt/citest
has been failingtest_hello
onwards, ortest_pipe
onwards, hanging our CI Checks in GitHub and Build Farm. (GitHub will cancel it after 6 hours)It might have been caused by one of these NuttX Commits:
Or maybe one of these NuttX Apps Commits:
Also when one test fails: Why do the rest of the tests take a loooong time to fail, hanging our CI Checks in GitHub and Build Farm?
Fail at test_hello onwards: https://github.com/NuttX/nuttx/actions/runs/11833005280/job/32970891697#step:7:143
Fail at test_pipe onwards: https://github.com/NuttX/nuttx/actions/runs/11850442831/job/33025374105#step:7:145
On which OS does this issue occur?
[OS: Linux]
What is the version of your OS?
Ubuntu LTS at GitHub Actions
NuttX Version
master
Issue Architecture
[Arch: risc-v]
Issue Area
[Area: Build System]
Verification
The text was updated successfully, but these errors were encountered: