[Issue]: Is maxBlocksPerMultiProcessor
value wrong on MI210/MI250?
#121
Labels
maxBlocksPerMultiProcessor
value wrong on MI210/MI250?
#121
Problem Description
Hi,
To reproduce, run:
hipDeviceAttributeMaxBlocksPerMultiProcessor
gives 2, but trying to estimate in a kernel the maximum number of active workgroups (see https://gist.github.com/Snektron/1fb62a39ee0d7b572c3441f0a53d310c), it seems clear that for workgroup size smaller than 1024 (say with workgroup sizes 64, 128, 256, 512), the number of workgroups scheduled per CU may be higher than 2.The computation
deviceProps.maxBlocksPerMultiProcessor = int(info.maxThreadsPerCU_ / info.maxWorkGroupSize_);
in https://github.com/ROCm/clr/blob/b8ba4ccf9c53f6558a5e369e3c1c05de97a0c28f/hipamd/src/hip_device.cpp#L496C77-L496C94 seems wrong.What do you think?
Operating System
Ubuntu 24.04 LTS (Noble Numbat
CPU
AMD EPYC 73F3 16-Core Processor
GPU
AMD Instinct MI210
ROCm Version
ROCm 6.2.4
ROCm Component
HIP
Steps to Reproduce
No response
(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support
No response
Additional Information
No response
The text was updated successfully, but these errors were encountered: