-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading entry points can be much faster #1631
Comments
What is the measured time difference? And an even better solution is to create a hash table at compile time so that no iteration needs to occur at runtime (except for unknown functions since those aren't knowable at compile time). |
Please see the link to the repo's README in the OP.
Yes, indeed that is the idea, there would be a table of hashes of all known functions (derived from xml) baked in the repo. The hash that is done at runtime is on the function name that is being looked up. You'd still need an iteration, like in |
Yes, a compiled in hash table would be ideal. This was just a POC to show that strcmp is guilty of massive overhead for libvulkan, and changing it benefits everyone greatly. The time difference is in the page linked above, but I'll show here for completeness. With stock libvulkan and using Volk as the API loader:
With libvulkan using my POC patch, and again using Volk:
The "average time" for each row is the average time to complete one call of e.g. |
Also, my patch doesn't kill every one of the |
Ahh I shot from the hip when I sent this response. I quickly scanned through the readme originally, and only saw the comparison between glad & volk. Took a minute to piece together which was the 'unpatched' vs 'patched. These findings confirm my suspicion that the time taken with strcmp is not ideal, but not a deal breaker either. A lot of init time can be found inside of vkCreateInstance & vkCreateDevice, both calling the create functions on all drivers, but also in setting up the internal function dispatch tables, which are a constant overhead that grows with each new function added in the table. The patch is a wonderful proof of concept for the viability of this idea. I had always wanted to implement it, but never found the time nor strong reason. Side note from reading the readme:
vkEnumerateInstanceExtensionProperties is expensive for this reason. vkEnumerateDeviceExtensionProperties occurs after all drivers & layers that are to be loaded have been loaded. The loader has already sped up vkEnumerateInstanceExtensionProperties & vkCreateInstance by caching loaded drivers, which eliminates the dlopen/dlclose overhead to just 'once'. But this is only for drivers, not layers, and because the API was designed to not have global state, the loader doesn't cache the current state of the filesystem between global API calls. |
Makes much sense - especially considering how the first hundred or so strcmp's are being done EVERY single time, while the unknown function support and more dynamic logic is done near the end, rarely being run. |
As @tycho demonstrates in this repo loading entry points can be made much faster with this patch.
The basic principle is to hash the entry point names (offline) and the name being looked up, and then do a numerical lookup instead of
strcmp
compares. One could take this a step further and sort the pre-generated list of hashes sorted and do a binary search look up.The text was updated successfully, but these errors were encountered: