-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework hlsl-vector-type into two specs #361
base: main
Are you sure you want to change the base?
Changes from 2 commits
c0239b5
343f6b3
7ffe4d4
edcb7e1
e7b1442
35dccde
3f8c22c
6a23347
e932fad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -1,146 +1,278 @@ | ||||||||
<!-- {% raw %} --> | ||||||||
|
||||||||
* Proposal: [0026-HLSL-Vectors](0026-hlsl-long-vector-type.md) | ||||||||
* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra) | ||||||||
* Sponsor: [Damyan Pepper](https://github.com/damyanp) | ||||||||
* Status: **Under Consideration** | ||||||||
|
||||||||
# HLSL Long Vectors | ||||||||
|
||||||||
* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md) | ||||||||
* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra), [Greg Roth](https://github.com/pow2clk) | ||||||||
* Sponsor: [Greg Roth](https://github.com/pow2clk) | ||||||||
* Status: **Under Consideration** | ||||||||
|
||||||||
## Introduction | ||||||||
|
||||||||
HLSL has supported vectors in a limited capacity (int3, float4, etc.), and these are scalarized in DXIL; small vectors while useful in a traditional graphics context do not scale well with the evolution on HLSL as a more general purpose language targetting Graphics and Compute. Notably, with the ubiquitous adoption of machine learning techniques which often get expressed as vector-matrix operations, there is a need for supporting larger vector sizes in HLSL and preserving these vector objects at the DXIL level to take advantage of specialized hardware that can accelerate vector operations. | ||||||||
HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.). | ||||||||
These are useful in a traditional graphics context for representation and manipulation of | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
geometry and color information. | ||||||||
The evolution of HLSL as a more general purpose language targeting Graphics and Compute | ||||||||
greatly benefit from longer vectors to fully represent these operations rather than to try to | ||||||||
break them down into smaller constituent vectors. | ||||||||
This feature adds the ability to declare and use native HLSL vectors longer than four elements. | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
## Motivation | ||||||||
|
||||||||
The adoption of machine learning techniques expressed as vector-matrix operations | ||||||||
require larger vector sizes to be representable in HLSL. | ||||||||
To take advantage of specialized hardware that can accelerate vector operations, | ||||||||
these and other vector objects need to be preserved at the DXIL level. | ||||||||
|
||||||||
## Proposed solution | ||||||||
|
||||||||
Enable vectors of longer length in HLSL and preserve the vector type in DXIL. | ||||||||
Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if the range is inclusive, then that 4 should be 5. |
||||||||
Such vectors will hereafter be referred to as "long vectors". | ||||||||
These will be supported for all elementwise intrinsics that take variable-length vector parameters. | ||||||||
For certain operations, these vectors will be represented as native vectors using [dxil vectors](NNNN-dxil-vectors.md). | ||||||||
|
||||||||
## Detailed design | ||||||||
|
||||||||
### HLSL vectors `vector<T, N>` | ||||||||
### HLSL vectors | ||||||||
|
||||||||
Currently HLSL allows declaring vectors using a templated representation: | ||||||||
|
||||||||
Currently HLSL allows `vector<T, N> name;` where `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type and `N`, number of | ||||||||
components, is a positive integer less than or equal to 4. See current definition [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). | ||||||||
This proposal extends this support to longer vectors (beyond 4). | ||||||||
```hlsl | ||||||||
vector<T, N> name; | ||||||||
``` | ||||||||
|
||||||||
`T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type. | ||||||||
`N` is the number of components and must be an integer between 1 and 4 inclusive. | ||||||||
See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details. | ||||||||
This proposal adds support for long vectors of length greater than 4 by | ||||||||
allowing `N` to be greater than 4 where previously such a declaration would produce an error. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this design accommodate There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Runtime specialization constants are not currently supported in HLSL. That's a question we'll need to answer when it is. For now, any similar mechanism produces an error: https://godbolt.org/z/5Kszf9Tr3 This includes the existing vk:: namespace specialization/push constant support.
|
||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Somewhat related to what @tex3d asked, but more broad: the spec needs to say how N can be written.
I guess it has to be statically computed. (I'm not sure what the HLSL term for it.) |
||||||||
The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N` | ||||||||
defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples | ||||||||
[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). | ||||||||
Declarations of long vectors require the use of the template declaration. | ||||||||
Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate | ||||||||
the element type and number of elements (e.g. float2, double4) are allowed for long vectors. | ||||||||
|
||||||||
The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave | ||||||||
uniformity requirements, but implementations may specify best practices in certain uses for optimal performance. | ||||||||
uniformity requirements, but implementations may specify best practices in certain uses for optimal performance. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this line needed for the long vector spec? It's probably implicitly assumed? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Which part do you take to be implicitly assumed? I don't think support across all shader stages should be assumed. The control/wave uniformity requirements touch more on the followups and could probably be removed. I'm inclined to leave the implementations encouraging best practices language as some platforms might end up scalarizing vector operations at some point or another which won't lead to the best performance. |
||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I assume the type of a local (function scoped) variable can have long vector type. That's not mentioned. The examples below show cases like these. |
||||||||
**Restrictions on the uses of vectors with N > 4** | ||||||||
Long vectors are not permitted in: | ||||||||
|
||||||||
* Vectors with length greater than 4 are not permitted inside a `struct`. | ||||||||
* Vectors with length greater than 4 are not permitted as shader input/output parameters. | ||||||||
* Resource types other than ByteAddressBuffer or StructuredBuffer. | ||||||||
* Any element of the shader's signature including entry function parameters and return types. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this include or exclude things like payload (DXR or mesh), attribute, or node record structures? Testing scenarios below make it clear, but it should be clear here, and this should also include corresponding intrinsics that accept UDT values leading to these shader parameter types elsewhere. We should also differentiate temporary limitations for the first implementation from limitations that have a good reason to be more permanent in the language. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I've added a bit more detail here and gone into still more detail about the ray tracing interfaces that are disallowed in the diagnostics section. I think that's an appropriate place where here we can be more vague.
I thought we determined that we wouldn't draw such distinctions in this spec. It's my intention to document it as it will ultimately be, removing any references to temporary approaches. From context, I suspect you may think of this as a "temporary" approach? I don't think we are relaxing these restrictions as part of this shader model release. Mention of other possibilities might make sense in the "alternatives considered" section as potential future work, but otherwise, I'd prefer to leave it unmentioned. |
||||||||
* Cbuffers or tbuffers. | ||||||||
damyanp marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
**Constructing vectors** | ||||||||
#### Constructing vectors | ||||||||
|
||||||||
HLSL vectors can be constructed through initializer lists and constructor syntax initializing or by assignment. | ||||||||
HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment. | ||||||||
|
||||||||
Examples: | ||||||||
|
||||||||
``` | ||||||||
vector<uint, 5> vecA = {1, 2, 3, 4, 5}; | ||||||||
``` hlsl | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
vector<uint, 5> vecA = {1, 2, 3, 4, 5}; | ||||||||
vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0); | ||||||||
uint4 initval = {0, 0, 0, 0}; | ||||||||
vector<uint, 8> vecC = {uint2(coord.xy), vecB}; | ||||||||
vector<uint, 6> vecD = vecB; | ||||||||
``` | ||||||||
|
||||||||
**Load and Store vectors from Buffers/Arrays** | ||||||||
#### Vectors in Raw Buffers | ||||||||
|
||||||||
For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending | ||||||||
the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods. | ||||||||
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods | ||||||||
with a vector type of the required size as the template parameter and byte offset parameters. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we discuss alignment requirements at all here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have any suggestions on how that discussion might look? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could say that a long vector <T,N> has the same alignment requirements as an array of N elements of type T. |
||||||||
|
||||||||
``` | ||||||||
// Load/Store from [RW]ByteAddressBuffers | ||||||||
```hlsl | ||||||||
RWByteAddressBuffer myBuffer; | ||||||||
|
||||||||
vector<uint, N> val = myBuffer.LoadN(uint StartOffsetInBytes); | ||||||||
myBuffer.StoreN<T>(uint StartoffsetInBytes, vector<T, N> stVec); | ||||||||
vector<T, N> val = myBuffer.Load< vector<T, N> >(StartOffsetInBytes); | ||||||||
myBuffer.Store< vector<T, N> >(StartoffsetInBytes + 100, val); | ||||||||
|
||||||||
``` | ||||||||
|
||||||||
StructuredBuffers with N-element vectors are declared using the template syntax | ||||||||
with a long vector type as the template parameter. | ||||||||
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods | ||||||||
with the element index parameters. | ||||||||
|
||||||||
```hlsl | ||||||||
RWStructuredBuffer< vector<T, N> > myBuffer; | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the space between There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. HLSL 2021 templates are C++98/03-era, so the space is required. |
||||||||
|
||||||||
// Load/Store from groupshared arrays | ||||||||
groupshared T inputArray[512]; | ||||||||
groupshared T outputArray[512]; | ||||||||
vector<T, N> val = myBuffer.Load(elementIndex); | ||||||||
myBuffer.Store(elementIndex, val); | ||||||||
|
||||||||
Load(vector<T,N> ldVec, groupshared inputArray, uint offsetInBytes); | ||||||||
Store(vector<T,N> stVec, groupshared outputArray, uint offsetInBytes); | ||||||||
``` | ||||||||
|
||||||||
**Operations on vectors** | ||||||||
#### Accessing elements of long vectors | ||||||||
|
||||||||
Long vectors support the existing vector subscript operators to return the scalar element values. | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
They do not support swizzle operations as they are limited to only the first four elements. | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
#### Operations on long vectors | ||||||||
|
||||||||
Support all HLSL intrinsics that are important as activation functions: fma, exp, log, tanh, atan, min, max, clamp, and | ||||||||
step. Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors. | ||||||||
Support all HLSL intrinsics that perform [elementwise calculations](NNNN-dxil-vectors.md#elementwise-intrinsics) | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While defining the HLSL language, I do not understand the need to link to the DXIL proposal as if it defines what's meant by "elementwise calculations" here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't consider this a language spec. I structured it as a shader model spec. There's no "as if" about it. I defined what elementwise calculations are in the dxil vectors spec. I don't really care where that definition ends up, but it's useful to have because there isn't anywhere else. Since I wrote this spec to depend on that one, it made sense to put it there. As it stands, neither spec technically depends on the other because long vectors can be implemented with scalarization and native vectors can be used without changing the allowed vector length. The lack of any hard interdependence inclines me further toward leaving this as having a DXIL element as well as language elements even if it's just discussion of validation changes. |
||||||||
that take parameters that could be long vectors and whose function doesn't limit them to shorter vectors. | ||||||||
These are operations that perform the same operation on an element regardless of its position in the vector | ||||||||
except that the position indicates which element(s) of other vector parameters might be used in that calculation. | ||||||||
|
||||||||
Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions). | ||||||||
|
||||||||
Note: Additionally any mathematical operations missing from the above list but needed as activation functions for neural | ||||||||
network computations will be added. | ||||||||
#### Allowed elementwise vector intrinsics | ||||||||
|
||||||||
* Trigonometry : acos, asin, atan, atan2, cos, cosh, degrees, radians, sin, sinh, tan, tanh | ||||||||
* Math: abs, ceil, clamp, exp, exp2, floor, fma, fmod, frac, frexp, ldexp, lerp, log, log10, log2, mad, max, min, pow, rcp, round, rsqrt, sign, smoothstep, sqrt, step, trunc | ||||||||
* Float Ops: f16tof32, f32tof16, isfinite, isinf, isnan, modf, saturate | ||||||||
* Bitwise Ops: reversebits, countbits, firstbithigh, firstbitlow | ||||||||
* Logic Ops: and, or, select | ||||||||
* Reductions: all, any, clamp, dot | ||||||||
* Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal | ||||||||
* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst | ||||||||
* Wave Reductions: WaveActiveAllEqual, WaveMatch | ||||||||
|
||||||||
#### Native vector intrinsics | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
Of the above list, the following will produce the appropriate unary, binary, or tertiary | ||||||||
DXIL intrinsic that take native vector parameters: | ||||||||
|
||||||||
* fma | ||||||||
* exp | ||||||||
* log | ||||||||
* tanh | ||||||||
* atan | ||||||||
* min | ||||||||
* max | ||||||||
* clamp | ||||||||
* step | ||||||||
|
||||||||
#### Disallowed vector intrinsics | ||||||||
|
||||||||
* Only applicable to for shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What is the reasoning for disallowing the conversion functions? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I didn't address this. Some of these comments didn't show up in my review of the files. I think you're referring to the as functions? There is some variability there. Some of them map to a simple bitcast. Those are fine. Some of them take multiple parameters representing low and hi bits that don't map as neatly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Couldn't some of the bitcast intrinsics ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know why these comments didn't show up in my pass of the file diff. As I mentioned above in comment #361 (comment), I agree some of these should be supported. |
||||||||
* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex | ||||||||
|
||||||||
### Debug Support | ||||||||
First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths. | ||||||||
|
||||||||
First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths. | ||||||||
|
||||||||
### Diagnostic Changes | ||||||||
|
||||||||
* Additional error messages for illegal or unsupported use of arbitrary length vectors. | ||||||||
* Remove current bound checks (N <= 4) for vector size in supported cases, both HLSL and DXIL. | ||||||||
Error messages should be produced for use of long vectors in unsupported interfaces. | ||||||||
|
||||||||
* The shader signature. | ||||||||
* A cbuffer/tbuffer. | ||||||||
* A work graph record. | ||||||||
* A mesh or ray tracing payload. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Also, don't forget the use of such structures in each relevant intrinsic rather than just the entry functions. ( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have added a lot of specificity to this list now. |
||||||||
|
||||||||
### Validation Changes | ||||||||
Errors should also be produced when long vectors are used as parameters to intrinsics | ||||||||
with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics) | ||||||||
Attempting to use any swizzle member-style accessors on long vectors should produce an error. | ||||||||
Declaring vectors of length longer than 128 should produce an error. | ||||||||
|
||||||||
* What additional validation failures does this introduce? | ||||||||
*Illegal uses of vectors should produce errors* | ||||||||
* What existing validation failures does this remove? | ||||||||
*Allow legal uses of vectors with number of components greater than 4* | ||||||||
### Validation Changes | ||||||||
|
||||||||
## D3D12 API Additions | ||||||||
Validation should produce errors when a long vector is found in: | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think DXIL validation should be left to the DXIL spec, no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not find it useful to divide the specs along interchange format and language lines. The DXIL spec would have included elements of native vectors and also of long vectors while the language spec would have contained only part of the long vectors description. As such, I created two shader model features that can, but don't have to include language changes. This one includes language and DXIL changes. "The DXIL spec" only has DXIL changes, but its characteristic feature is that it documents native DXIL vectors. |
||||||||
|
||||||||
TODO: Possible checks for DXIL vector support and tiered support. | ||||||||
* The shader signature. | ||||||||
* A cbuffer/tbuffer. | ||||||||
* A work graph record. | ||||||||
* A mesh or ray tracing payload. | ||||||||
|
||||||||
## Check Feature Support | ||||||||
Use of long vectors in unsupported intrinsics should produce validation errors. | ||||||||
|
||||||||
Open Issue: Can implementations support vector DXIL? | ||||||||
## Runtime Additions | ||||||||
pow2clk marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
### Minimum Support Set | ||||||||
Support for Long vectors requires dxil vector support as defined in [the specification](NNNN-dxil-vectors.md). | ||||||||
|
||||||||
Use of long vectors in a shader should be indicated in DXIL with the corresponding | ||||||||
shader model version and shader feature flag. | ||||||||
|
||||||||
## Testing | ||||||||
|
||||||||
* How will correct codegen for DXIL/SPIRV be tested? | ||||||||
* How will the diagnostics be tested? | ||||||||
* How will validation errors be tested? | ||||||||
* How will validation of new DXIL elements be tested? | ||||||||
* A: *unit tests in dxc* | ||||||||
* How will the execution results be tested? | ||||||||
* A: *HLK tests* | ||||||||
### Compilation Testing | ||||||||
|
||||||||
#### Correct output testing | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's where I think we describe testing for each supported backend IR. Shouldn't we be giving SPIR-V some love here as well? Shouldn't there be a section before this that outlines various valid scenarios for AST testing? For a given implementation, perhaps there would be an additional infra spec to outline tests for initial codegen and various important phases through the compiler as well, right? Perhaps the AST test scenarios belong there too? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Definitely. I haven't done that research just yet. I've added a slightly hand-wavey allusion to the SPIR-V equivalent.
Have we done this in the past? I'm not sure what scenarios you have in mind. I'd like to see an example to better understand.
I think infra specs are useful to discuss forthcoming implementations. Given the state of this implementation, I don't think writing one would be as productive as carefully documenting what has been done in code and commit comments. I think that would be more likely to be preserved and found by future generations of coders. |
||||||||
|
||||||||
Verify that long vectors can be declared in all appropriate contexts: | ||||||||
|
||||||||
* local variables | ||||||||
* non-entry parameters | ||||||||
* non-entry return types | ||||||||
* StructuredBuffer elements | ||||||||
* Templated Load/Store methods on ByteAddressBuffers | ||||||||
* As members of arrays and structs in any of the above contexts | ||||||||
|
||||||||
Verify that long vectors in supported intrinsics produce appropriate outputs. | ||||||||
For the intrinsic functions listed in [Native vector intrinsics](#native-vector-intrinsics), | ||||||||
the generated DXIL intrinsic calls will have long vector parameters. | ||||||||
For other elementwise vector intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics), | ||||||||
the generated DXIL should scalarize the parameters and produce scalar calls to the corresponding DXIL intrinsics. | ||||||||
|
||||||||
Verify that long vector elements can be accessed using the subscript operation. | ||||||||
|
||||||||
#### Invalid usage testing | ||||||||
|
||||||||
Verify that compilation errors are produced for long vectors used in: | ||||||||
|
||||||||
* Entry function parameters | ||||||||
* Entry function returns | ||||||||
* Type buffer declarations | ||||||||
* Cbuffer blocks | ||||||||
* Cbuffer global variables | ||||||||
* Work graph records | ||||||||
* Mesh and ray tracing payloads | ||||||||
* Any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics) | ||||||||
* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`) | ||||||||
|
||||||||
### Validation Testing | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think DXIL validation belongs in the DXIL spec only. |
||||||||
|
||||||||
Verify that Validation produces errors for any DXIL intrinsic that corresponds to the | ||||||||
HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics). | ||||||||
Verify that Validation produces errors for any DXIL intrinsic with native vector parameters | ||||||||
that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics) | ||||||||
and are not listed in [native vector intrinsics](#native-vector-intrinsics). | ||||||||
|
||||||||
### Execution Testing | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this section belongs in the DXIL spec exclusively, even though in practice most of our execution tests in DXC start with HLSL when testing DXIL backends. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think execution testing may be the strongest argument for keeping it here. Long and native vectors aren't really interdependent and although the tests would likely share a lot of code, they should be independently testable in execution testing. |
||||||||
|
||||||||
Correct behavior for all of the intrinsics listed in [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics) | ||||||||
will be verified with execution tests that perform the operations on long vectors and confirm correct results | ||||||||
for the given test values. | ||||||||
Where possible, these tests will be variations on existing tests for these intrinsics. | ||||||||
|
||||||||
## Alternatives considered | ||||||||
|
||||||||
Our original proposal introduced an opaque Cooperative Vector type to HLSL to limit the scope of the feature to small | ||||||||
neural network evaluation and also contain the scope for testing. But aligning with the long term roadmap of HLSL to | ||||||||
enable generic vectors, it makes sense to not introduce a new datatype but use HLSL vectors, even if the initial | ||||||||
implementation only exposes partial functionality. | ||||||||
The original proposal introduced an opaque type to HLSL that could represent longer vectors. | ||||||||
This would have been used only for native vector operations. | ||||||||
This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some. | ||||||||
|
||||||||
Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations. | ||||||||
This direction also aligns with the long term roadmap of HLSL to enable generic vectors. | ||||||||
Since the new data type would have required extensive testing as well, | ||||||||
the testing burden saved may not have been substantial. | ||||||||
Since these vectors are to be added eventually anyway, the testing serves multiple purposes. | ||||||||
It makes sense to not introduce a new datatype but use HLSL vectors, | ||||||||
even if the initial implementation only exposes partial functionality. | ||||||||
|
||||||||
## Open Issues | ||||||||
|
||||||||
* Q: Is there a limit on the Number of Components in a vector? | ||||||||
* A: Chose a number based on precedents set by other languages. Support atleast 128. | ||||||||
* A: 128. It's big enough for some known uses. | ||||||||
There aren't concrete reasons to restrict the vector length. | ||||||||
Having a limit facilitates testing and sets expectations for both hardware and software developers. | ||||||||
|
||||||||
* Q: Usage restrictions | ||||||||
* A: *General vectors (N > 4) are not permitted inside structs.* | ||||||||
* A: Long vectors may not form part of the shader signature. | ||||||||
There are many restrictions on signature elements including bit fields that determine if they are fully written. | ||||||||
By definition, these involve more interfaces that would require additional changes and testing. | ||||||||
* Q: Does this have implications for existing HLSL source code compatibility? | ||||||||
* A: *No, existing HLSL code is unaffected by this change.* | ||||||||
* A: *Change the default N = 4 for vectors? Will affect existing shaders.* | ||||||||
* Q: How will SPIRV be supported? | ||||||||
* A: | ||||||||
* Q: When do HLSL vectors remain as vectors and when do they get scalarized in DXIL? | ||||||||
* A: | ||||||||
* Q: Can all implementations support vector DXIL? | ||||||||
* A: Feature check? | ||||||||
|
||||||||
## Acknowledgments | ||||||||
|
||||||||
* A: Existing HLSL code that makes no use of long vectors will have no semantic changes. | ||||||||
* Q: Should this change the default N = 4 for vectors? | ||||||||
* A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged. | ||||||||
* Q: How will SPIR-V be supported? | ||||||||
* A: TBD | ||||||||
* Q: should swizzle accessors be allowed for long vectors? | ||||||||
* A: No. It doesn't make sense since they can't be used to access all elements | ||||||||
and there's no way to create enough swizzle members to accommodate the longest allowed vector. | ||||||||
* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors. | ||||||||
* A: After some consideration, we opted not to include explicit Load/Store operations for this function. | ||||||||
There are at least a couple ways this could be resolved, and the preferred solution is outside the scope. | ||||||||
Comment on lines
+321
to
+323
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this mean that long vectors can marked as groupshared and it just works? Wondering if this Q&A belongs in the DXIL spec? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. groupshared is allowed. I don't list it as being disallowed, but I don't mind calling it out explicitly. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe I'm missing the context or the significance of not including explicit Load/Store operations here, or just plain misunderstanding this Q/A altogether. The "preferred solution is outside the scope" seems to suggest that there's not going to be a way to use long vectors in groupshared. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's not a feature that is directly connected with long nor native vectors. The core problem is that groupshared memory is limited and explicitly allocating it to one purpose is expensive. Instead, many users want to reuse groupshared memory for a few different purposes depending on stage and time. There's some disagreement on how we can best solve this. We might provide what is essentially a groupshared rawbuffer with mechanisms to perform loads and stores on it similarly to how we do on global rawbuffers. That was the alternative approach previously described. There are more clever compiler things (tm) that could be done or other ways of enabling the user in this way. Since it is not a problem directly connected with this, since there are a few different approaches we might take that would take a long time to decide on and implement, and since there are other solutions to the problem using existing mechanisms, we deemed it out of scope for this project. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I think that this Q/A is a bit confusing as written now. If I understand correctly, it's saying that we're not going to tackle solving the problem of people wanting buffer-like access to groupshared memory. So in the same way that we don't support explicit Load/Store operations for other data types, we're not going to add new ones for this. Marking a long vector as groupshared is still expected to work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What you've said is accurate, so I don't know why you think it's confusing. I'm happy to take another run at it, but I think it's succinct and sufficient. It doesn't go into detail because I don't think we need to. It's a feature for another day. I could create an issue for it that could contain all the details if that would help. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was only able to reach the understanding by this conversation. |
||||||||
|
||||||||
<!-- {% endraw %} --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always doesn't indicate enough that this is past behavior. and
as many as
just read awkward to me. Also replaced the repeat of the word element withscalar
.