-
Notifications
You must be signed in to change notification settings - Fork 449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[lspci] Fix handling of more than one PCIe domain #1822
Conversation
be979bc
to
656eb32
Compare
@Babar can you add a couple cases in spec/unit/plugins/linux/lspci_spec.rb to match the additional regex component specifically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please expand the test cases to cover additional regex component.
656eb32
to
8d49639
Compare
Summary: I'm guessing most machines only have one PCIe root port, but things like ARM hosts tend to have more, and that breaks as it creates collisions. As we want to keep backwards compatibility, and `lspci` is run with the defaults that doesn't show the root port if it's `0000`, only add it when it's non-zero, and also add it as a new extra field called `root_port` to denote the fact that there is one (useful when using the BDF to find files on disk, as when there is no `root_port`, one needs to add the extra `0000:`. More detailed explanation I wrote on the slack channel: On Linux, we run `lspci -vnnmk`, which will show the BDF in the first `Device:` field. So the first bug is that the regular expression on https://github.com/chef/ohai/blob/main/lib/ohai/plugins/linux/lspci.rb#L54 is missing a `\` before the `.`, therefore will will match `0001:02:03.4` as `01:02:03` which is plain broken. But fixing that we end up with `02:03.4` which is also plain wrong as it loses the root complex. I see a few ways we could fix this: 1. Highly disruptive: we add -D to lspci, which means we'll always get the root complex even when it's 0000 . This is terrible 'cause it will basically break everything 2. We do this but only with an option in ohai. Less terrible, but means people have to know about that option 3. We do this but only when we notice more than one root complex. That's not really doable 4. We add a new field, root_complex in the PCI structure, which is 0000 by default and the actual root complex for everything else I like 4 the most, as it mirrors closely what lspci does (https://github.com/pciutils/pciutils/blob/master/lspci.c#L285) or maybe 5. We do it the same way lspci does it: if the root complex is 0, we just store the BDF as the ID, if it's not, we store the root complex + BDF we can't really do 4 'cause we might have collisions... Like if you have 0000:01:02.3 and 0001:01:02.3 then they have to be stored differently. So we can either do: Option A: 01:02.3 and 0001:01:02.3 (this can co-exist with everything today) Option B: 00000:1:02.3 and 0001:01:02.3 (this most likely needs to become pci2) Test Plan: Before: ``` $ ohai | jq .pci\|keys [ "00:00.0", "01:00.0", "01:00.1", "02:00:0", "04:00:0", "08:00:0", "08:01:0", "08:02:0", "08:03:0", "08:04:0", "08:05:0", "09:00:0", "09:01:0" ] ``` After: ``` $ ohai | jq .pci\|keys [ "0002:00:00.0", "0004:00:00.0", "0008:00:00.0", "0008:01:00.0", "0008:02:01.0", "0008:02:02.0", "0008:03:00.0", "0008:04:00.0", "0008:05:00.0", "0009:00:00.0", "0009:01:00.0", "00:00.0", "01:00.0", "01:00.1" ] ``` Reviewers: jaymzh Closes: chef#1693 Signed-off-by: Olivier Raginel <[email protected]>
8d49639
to
e6a4242
Compare
Quality Gate passedIssues Measures |
@tpowell-progress : added one. Can add more but wasn't sure what I should be testing. |
Description
(sorry, I'm using domain and root port interchangeably. Maybe domain is more correct, if so I can update the code to use that instead of
root_port
)I'm guessing most machines only have one PCIe root port, but things like ARM hosts tend to have more, and that breaks as it creates collisions.
As we want to keep backwards compatibility, and
lspci
is run with the defaults that doesn't show the root port if it's0000
, only add it when it's non-zero, and also add it as a new extra field calledroot_port
to denote the fact that there is one (useful when using the BDF to find files on disk, as when there is noroot_port
, one needs to add the extra0000:
.More detailed explanation I wrote on the slack channel: On Linux, we run
lspci -vnnmk
, which will show the BDF in the firstDevice:
field. So the first bug is that the regular expression on https://github.com/chef/ohai/blob/main/lib/ohai/plugins/linux/lspci.rb#L54 is missing a\
before the.
, therefore will will match0001:02:03.4
as01:02:03
which is plain broken. But fixing that we end up with02:03.4
which is also plain wrong as it loses the root complex. I see a few ways we could fix this:I like 4 the most, as it mirrors closely what lspci does (https://github.com/pciutils/pciutils/blob/master/lspci.c#L285) or maybe
5. We do it the same way lspci does it: if the root complex is 0, we just store the BDF as the ID, if it's not, we store the root complex + BDF we can't really do 4 'cause we might have collisions... Like if you have 0000:01:02.3 and 0001:01:02.3 then they have to be stored differently. So we can either do: Option A: 01:02.3 and 0001:01:02.3 (this can co-exist with everything today) Option B: 00000:1:02.3 and 0001:01:02.3 (this most likely needs to become pci2)
Test Plan:
Before:
After:
Related Issue
pci collection should always include the domain #1693
Types of changes
Checklist:
Gemfile.lock
has changed, I have used--conservative
to do it and included the full output in the Description above.