Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) #213

everpeace · 2024-12-03T04:41:56Z

I understand DRA will finally promote to Beta in v1.32🎉 Thank you very much contributors for your hard work standardizing flexible device scheduling and implementing NVIDIA's dra-driver.

Do you have a plan exposing intra-node topology as device attribute?? Especially distances between GPU<->GPU and GPU<->NIC or HCA (I imagine nvidia-smi topo -m equivalent information)? Or, would you have a plan to provide some extension point to add user-defined device attribute in this dar-driver??

I imagine below usecases for optimizing training performance:

~~Single Node Multi GPUs:~~
- ~~a user wants to have 1 pod with 2 gpus which are connected via NVLink each other (NV# in nvidia-smi topo -m)~~
  → discussed in NVLINK Aware Scheduling #214
Multi Node Multi GPUs:
- a user wants like to have N pods per 4 gpus each of which have adjacent NIC or HCA (PIX in nvidia-smi topo -m) in specific zone(achieved by node selector)
  - probably, it needs integration with cni and network device plugins (e.g. https://github.com/k8snetworkplumbingwg/sriov-network-device-plugin)

Thanks, in advance.

The text was updated successfully, but these errors were encountered:

klueska · 2024-12-04T12:42:02Z

I'm open to suggestions on what these attributes would look like and how they would be used, but as I mentioned in my comment here #214 (comment), I've struggled to come up with something that would actually be useful.

everpeace · 2024-12-04T12:46:48Z

Thanks,

Or, would you have a plan to provide some extension point to add user-defined device attribute in this dar-driver??

How about this? If this driver provides such knob, user will be able to publish their own extra attributes for thier needs.

everpeace · 2024-12-11T03:27:43Z

Multi Node Multi GPUs:

a user wants like to have N pods per 4 gpus each of which have adjacent NIC or HCA

For this use case, I found the presentation excactly matched this case.

Better Together! GPU, TPU and NIC Topological Alignment with DRA - John Belamaric, Google & Patrick Ohly, Intel

So, If both NVIDIA/k8s-dra-driver and kubernetes-sigs/cni-dra-driver exposed k8s.io/pcieRoot attribute, user can define the ResourceClaim like below, as described in the session:

apiVersion: resource.k8s.io/v1alpha2
kind: ResourceClaim 
metadata:
  name: big-gpu-with-aligned-nic
spec:
  devices:
  requests:
  - name: gpu
    deviceClassName: gpu.nvidia.com
    selectors:
    - cel:
        expression: "device capacity['memory'].compareTo(quantity ('80Gi')) >= 0"
  - name: nic
    deviceClassName: rdma.nvidia.com
    selectors:
    - cel:
        expression: "device.attribute[ 'sriovType'] == 'vf'"
  constraints:
  - requestNames: ["gpu", "nic"]
    matchAttribute: k8s.io/pcieRoot

Thus, I would like to know if NVIDIA/k8s-dra-driver plans to expose k8s.io/pcieRoot attribute.

Thanks in advance.

Single Node Multi GPUs:

a user wants to have 1 pod with 2 gpus which are connected via NVLink each other (NV# in nvidia-smi topo -m)

Because #214 clearly describes this case, I re-phrased this issue title for isolated discussion.

klueska · 2024-12-11T14:17:14Z

Unfortunately, we can't include this until we start to standardize the set of attributes we put under the k8s.io/* prefix. I could include this under the nvidia.com/* prefix, but this is less useful since you can't then use it in a matchAtrribute to match with an attribute with a different prefix from a different driver.

everpeace · 2024-12-11T14:59:12Z

Thanks for the quick reply. OK, then, let me keep this open for now.

everpeace mentioned this issue Dec 4, 2024

NVLINK Aware Scheduling #214

Open

everpeace changed the title ~~Exposing Intra-Node Topology (Distance between GPUs, or GPU and NIC, HCA) as device attributes~~ Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) #213

Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) #213

everpeace commented Dec 3, 2024 •

edited

Loading

klueska commented Dec 4, 2024

everpeace commented Dec 4, 2024

everpeace commented Dec 11, 2024 •

edited

Loading

klueska commented Dec 11, 2024

everpeace commented Dec 11, 2024

Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) #213

Topological alignment between GPUs and NICs in DRA (exposing pci device topology as device attribute?) #213

Comments

everpeace commented Dec 3, 2024 • edited Loading

klueska commented Dec 4, 2024

everpeace commented Dec 4, 2024

everpeace commented Dec 11, 2024 • edited Loading

klueska commented Dec 11, 2024

everpeace commented Dec 11, 2024

everpeace commented Dec 3, 2024 •

edited

Loading

everpeace commented Dec 11, 2024 •

edited

Loading