Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: Is maxBlocksPerMultiProcessor value wrong on MI210/MI250? #121

Open
fxmarty-amd opened this issue Jan 9, 2025 · 1 comment
Open

Comments

@fxmarty-amd
Copy link

Problem Description

Hi,

To reproduce, run:

#include <stdio.h>
#include <hip/hip_runtime.h>
#include <iostream>

#define HIP_WARN(XXX) \
    do { if (XXX != hipSuccess) std::cerr << "HIP Error: " << \
    hipGetErrorString(XXX) << ", at line " << __LINE__ \
    << std::endl; hipDeviceSynchronize(); } while (0)

int main() {
    int devCount;

    HIP_WARN(hipGetDeviceCount(&devCount));

    std::cout << "Number of devices: " << devCount << "\n";

    int block_per_sm;
    int thread_per_sm;
    HIP_WARN(hipDeviceGetAttribute(&block_per_sm, hipDeviceAttributeMaxBlocksPerMultiProcessor, 0));
    HIP_WARN(hipDeviceGetAttribute(&thread_per_sm, hipDeviceAttributeMaxThreadsPerMultiProcessor, 0));
    
    std::cout << "Max blocks per CU: " << block_per_sm << "\n";
    std::cout << "Max threads per CU: " << thread_per_sm  << "\n";
}

hipDeviceAttributeMaxBlocksPerMultiProcessor gives 2, but trying to estimate in a kernel the maximum number of active workgroups (see https://gist.github.com/Snektron/1fb62a39ee0d7b572c3441f0a53d310c), it seems clear that for workgroup size smaller than 1024 (say with workgroup sizes 64, 128, 256, 512), the number of workgroups scheduled per CU may be higher than 2.

The computation deviceProps.maxBlocksPerMultiProcessor = int(info.maxThreadsPerCU_ / info.maxWorkGroupSize_); in https://github.com/ROCm/clr/blob/b8ba4ccf9c53f6558a5e369e3c1c05de97a0c28f/hipamd/src/hip_device.cpp#L496C77-L496C94 seems wrong.

What do you think?

Operating System

Ubuntu 24.04 LTS (Noble Numbat

CPU

AMD EPYC 73F3 16-Core Processor

GPU

AMD Instinct MI210

ROCm Version

ROCm 6.2.4

ROCm Component

HIP

Steps to Reproduce

No response

(Optional for Linux users) Output of /opt/rocm/bin/rocminfo --support

No response

Additional Information

No response

@lucbruni-amd
Copy link

Hi @fxmarty-amd, thank you for reporting. An internal ticket has been opened for investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants