Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf_counter test might fail on processors with L4 cache #2435

Open
miladfarca opened this issue Jan 9, 2025 · 3 comments
Open

perf_counter test might fail on processors with L4 cache #2435

miladfarca opened this issue Jan 9, 2025 · 3 comments

Comments

@miladfarca
Copy link

This line is currently failing on a powerpc (LE running RHEL) with a much lower value compared to the expectation:

HWY_ASSERT(values[PerfCounters::kL3Loads] == 0.0 ||

i.e

values[PerfCounters::kL3Loads] = 50.72

The values array looks like this:

{0, 2680918956.7209449, 405940726.6087172, 162.43341942283425, 0, 0, 3.0080262856080417, 50.728248198644259, 0, 18730.659643699411, 0 <repeats 54 times>}

Since Power machines do have L4 cache I'm assuming setting PERF_COUNT_HW_CACHE_LL is measure the lowest level cache, being L4 rather than L3, which explains why the value is much lower than expected.

If this is true then we could ifdef the test and lower the number on PPC or even lower it globally for all platforms.

@miladfarca
Copy link
Author

/cc @jan-wassenberg

@jan-wassenberg
Copy link
Member

Thanks for pointing that out! I agree an L4 LLC would have fewer misses, we can reduce the threshold to 10 (vs your 50).

@miladfarca
Copy link
Author

miladfarca commented Jan 9, 2025

Thank you for making the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants