-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] MSL: Fix dynamically indexed pull interpolants #2364
base: main
Are you sure you want to change the base?
Conversation
This is a first attempt at fixing the test failures mentioned in KhronosGroup#1796 by copying scalarized/flattened arrays back into arrays in the fragment shader for each "unique" interpolate opration. This approach has as number of drawbacks such as negative impacts to runtime performance and shader size, as well as having limited coverage for dynamic offsets (InterpolateAtOffset) and samples (InterpolateAtSample).
FWIW... The original 24 CTS failures in #1796 have already been fixed. The 3 additional pull-model interpolation CTS failures that @aitor-lunarg added to #1796, are still occurring, but report a different error from the original (see #1796-comment). So, the fixes here should just address pull-model interpolation, which I see it mostly does, but any extra aimed at the original 24 errors, would not be needed. However, I have not reviewed the code here in enough detail to confirm any unnecessary code. |
It's very hard to reason from this PR what is being solved exactly. A solution that doesn't solve interpolateAtOffset and Samples is not good enough I think. Do you have an example GLSL shader that demonstrates the problem? I barely remember what the pull model MSL stuff is and need a refresher. |
In Metal, pull-model interpolants use a special wrapper template, As I recall, the problem is that Metal only allows certain types in a |
I agree. This particular change is trying to address the CTS failures, and if this is the correct approach there should be a follow up change to make it more general.
The fragment shader from #version 450
layout(location = 0) sample in vec2 inVar[2];
layout(location = 0) out vec4 fs_out_color;
void main (void)
{
int i = int(gl_FragCoord.x) % 2;
// This is where spirv-cross will throw at https://github.com/KhronosGroup/SPIRV-Cross/blob/d33a39045dd8f5b87eadc3d4cc88b32918d1c41a/spirv_msl.cpp#L8395 due to i not being a constant.
const float x = interpolateAtCentroid(inVar[i].x);
fs_out_color = vec4(x, 0, 0, 1);
} reproduces the issue. Replacing // ...
struct main0_in
{
interpolant<float2, interpolation::perspective> inVar_0 [[user(locn0)]];
interpolant<float2, interpolation::perspective> inVar_1 [[user(locn1)]];
};
fragment main0_out main0(main0_in in [[stage_in]], uint gl_SampleID [[sample_id]], float4 gl_FragCoord [[position]])
{
main0_out out = {};
spvUnsafeArray<float2, 2> inVar = {};
// One new array per interpolate function times unique value parameter passed to that function.
// This would obviously increase the shader size by quite a bit, though realistically perhaps not very
// much after optimization. This could also be done via setting up a large "if/else" branch; I'm not
// sure what's more amenable to metal.
spvUnsafeArray<float2, 2> inVar_centroid_108;
inVar[0] = in.inVar_0.interpolate_at_sample(gl_SampleID);
inVar_centroid_108[0] = in.inVar_0.interpolate_at_centroid();
inVar[1] = in.inVar_1.interpolate_at_sample(gl_SampleID);
inVar_centroid_108[1] = in.inVar_1.interpolate_at_centroid();
gl_FragCoord.xy += get_sample_position(gl_SampleID) - 0.5;
int i = int(gl_FragCoord.x) % 2;
// This is the "solution"
float x = inVar2_centroid_108[i].x;
out.fs_out_color += float4(x, 0.0, 0.0, 1.0);
return out;
} This change will work for the // ...
float acc = 0.0;
for (int i = 0; i < 4; ++i) {
acc += interpolateAtSample(inVar[i].x, i);
}
// ... This is perhaps doable with loop unrolling, but if it's not possible to unroll the loop for some reason, I'm not sure how to approach it. |
It seems like arrays should work, e.g. this compiles:
|
If the array can be placed in a wrapper struct that resolves the pointer stuff, this sounds like the way to go. I'm not overly concerned with performance. This is an extremely esoteric feature that only has to work, not be optimal. |
Yep, that works and makes this so much easier; I should have tried that in metal first :/ I think I was making some erroneous assumptions based on Lines 8395 to 8396 in d33a390
I've shifted to working on a different issue, but will try to find some cycles to wrap this up next week time permitting. Thank you @HansKristian-Work! |
This is a first attempt at fixing the test failures mentioned in #1796 by copying scalarized/flattened arrays back into arrays in the fragment shader for each "unique" interpolate opration. This approach has as number of drawbacks such as negative impacts to runtime performance and shader size, as well as having limited coverage for dynamic offsets (InterpolateAtOffset) and samples (InterpolateAtSample).
This change currently fixes all the CTS tests mentioned in #1796 except
dEQP-VK.pipeline.monolithic.multisample_interpolation.nonuniform_interpolant_indexing.offset
, which requires some additional work (or runningspirv-opt -O
on the spirv before running spirv-cross also works), however, I'd like to get an idea of whether or not this approach would be acceptable in spirv-cross or an alternative is preferable.