Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Histogram Block Refactoring
Motivation
When using histogram loss, loss calculation can take up a significant portion of the training process.
This PR tries to improve on that by refactoring the original implementation of the losses while staying
consistent with the output of the original implementations.
Content
This PR contains refactored histogram losses. It adds a common base class to make future additions simpler as well as to
reduce code duplication. Common functionality, such as the kernel methods for resizing (sampling), pixel counting, and intensity scaling have been extracted into functions found in the new base class module.
Results
The refactored histogram blocks are now significantly smaller and easier to understand (at least in my opinion🤔).
The public interface and default parameters have been kept as-is, so no changes should be required on the caller side.
Performance tests showed an improvement of >5x for each RGB-uv, rg-chroma, and Lab blocks using their respective default parameters.
This performance improvement only increases on non-datacentre class hardware, such as consumer and professional GPUs.
The performance uplift on consumer grade hardware has been observed to be up to 20x due to hardware limitations of these devices (primarily due to the very limited double precision support).
Accuracy w.r.t. to the original implementation is acceptable. In all but some synthetic tests, mean deviation between original and refactored versions are well within the machine epsilon of 32-bit floating point maths.
Limitations
The refactored version drops support for input images bigger than about 2.1 gigapixels (i.e. roughly 45,000 by 45,000 pixels).
I personally don't think this limitation will pose an issue in the foreseeable future, but I think it's worth mentioning nonetheless😉.
And finally: thank you for your great work and for publishing your code!