Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netlink walk calls may return EINTR (regression) #1121

Open
saj opened this issue Nov 14, 2024 · 3 comments
Open

netlink walk calls may return EINTR (regression) #1121

saj opened this issue Nov 14, 2024 · 3 comments

Comments

@saj
Copy link

saj commented Nov 14, 2024

Downstream users of the CNI plugins have been observing occasional interrupted system call failures. The simple loopback plugin is also affected.

Calls like netlink.AddrList may return EINTR as of vishvananda/netlink v1.2.1.
vishvananda/netlink@aa4f20d
Prior to this commit, the (possibly incomplete) results from a walk would be returned with a nil error.

I think this upstream commit was incorporated here in d924f05, which was shipped as v1.6.0.

https://www.kernel.org/doc/html/next/userspace-api/netlink/intro.html#dump-consistency

Dump consistency

Some of the data structures kernel uses for storing objects make it hard to provide an atomic snapshot of all the objects in a dump (without impacting the fast-paths updating them).

Kernel may set the NLM_F_DUMP_INTR flag on any message in a dump (including the NLMSG_DONE message) if the dump was interrupted and may be inconsistent (e.g. missing objects). User space should retry the dump if it sees the flag set.

More context can be found in vishvananda/netlink#1018, which later added netlink.ErrDumpInterrupted. This commit is yet to be released, though it is available on vishvananda/netlink trunk.

AIUI:

  • if you require a consistent result from a walk, and can tolerate a potentially unbounded wait, these netlink calls should be retried
  • otherwise, disregard the error
    if errors.Is(err, unix.EINTR) on vishvananda/netlink v1.3.0, or
    if errors.Is(err, netlink.ErrDumpInterrupted) on vishvananda/netlink trunk

ty

@thompson-shaun
Copy link

We also seem to be running into this issue in moby/buildkit ref: moby/buildkit#5533

@robmry
Copy link

robmry commented Jan 24, 2025

Yes, it's the same issue - the description captures it.

In moby we ended up trying up to (an arbitrary) five times, then using the results anyway - on the basis that we'd probably be better off than we were before vishvananda/netlink v1.2.1, because a retry is likely to succeed, and the worst case is the same as it was. That's in:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants