diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md new file mode 100644 index 00000000..75423baa --- /dev/null +++ b/proposals/0026-hlsl-long-vector-type.md @@ -0,0 +1,325 @@ + + +# HLSL Long Vectors + +* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md) +* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra), [Greg Roth](https://github.com/pow2clk) +* Sponsor: [Greg Roth](https://github.com/pow2clk) +* Status: **Under Consideration** + +## Introduction + +HLSL has previously supported vectors of as many as four elements (int3, float4, etc.). +These are useful in a traditional graphics context for representation and manipulation of + geometry and color information. +The evolution of HLSL as a more general purpose language targeting Graphics and Compute + greatly benefit from longer vectors to fully represent these operations rather than to try to + break them down into smaller constituent vectors. +This feature adds the ability to load, store, and perform elementwise operations on HLSL + vectors longer than four elements. + +## Motivation + +The adoption of machine learning techniques expressed as vector-matrix operations + require larger vector sizes to be representable in HLSL. +To take advantage of specialized hardware that can accelerate longer vector operations, + these vectors need to be preserved in the exchange format as well. + +## Proposed solution + +Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations. +Such vectors will hereafter be referred to as "long vectors". +These will be supported for all elementwise intrinsics that take variable-length vector parameters. +For certain operations, these vectors will be represented as native vectors using + [Dxil vectors](NNNN-dxil-vectors.md) and equivalent SPIR-V representations. + +## Detailed design + +### HLSL vectors + +Currently HLSL allows declaring vectors using a templated representation: + +```hlsl +vector name; +``` + +`T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type. +`N` is the number of components and must be an integer between 1 and 4 inclusive. +See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details. +This proposal adds support for long vectors of length greater than 4 by + allowing `N` to be greater than 4 where previously such a declaration would produce an error. + +The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N` +defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples +[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). +Declarations of long vectors require the use of the template declaration. +Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate + the element type and number of elements (e.g. float2, double4) are allowed for long vectors. + +#### Allowed Usage + +The new vectors will be supported in all shader stages including Node shaders. + +Long vectors can be: + +* Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers. +* Parameters and return types of non-entry functions. +* Stored in groupshared memory. +* Static global variables. + +Long vectors are not permitted in: + +* Resource types other than ByteAddressBuffer or StructuredBuffer. +* Any part of the shader's signature including entry function parameters and return types or + user-defined struct parameters. +* Cbuffers or tbuffers. +* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structures. +* A work graph record. + +While this describes where long vecgtors can be used and later sections will describe how, +implementations may specify best practices in certain uses for optimal performance. + +#### Constructing vectors + +HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment. +Vectors can be initialized and assigned from various casting operations including scalars and arrays. +Long vectors will maintain equivalent casting abilities. + +Examples: + +```hlsl +vector InitList = {1, 2, 3, 4, 5}; +vector Construct = vector(6, 7, 8, 9, 0, 0); +uint4 initval = {0, 0, 0, 0}; +vector VecVec = {uint2(coord.xy), vecB}; +vector Assigned = vecB; +float arr[5]; +vector CastArr = (vector)arr; +vector ArrScal = {arr, 7.9}; +vector ArrArr = {arr, arr}; +vector Scal = 4.2; +``` + +#### Vectors in Raw Buffers + +N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods +with a vector type of the required size as the template parameter and byte offset parameters. + +```hlsl +RWByteAddressBuffer myBuffer; + +vector val = myBuffer.Load< vector >(StartOffsetInBytes); +myBuffer.Store< vector >(StartoffsetInBytes + 100, val); + +``` + +StructuredBuffers with N-element vectors are declared using the template syntax + with a long vector type as the template parameter. +N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods +with the element index parameters. + +```hlsl +RWStructuredBuffer< vector > myBuffer; + +vector val = myBuffer.Load(elementIndex); +myBuffer.Store(elementIndex, val); + +``` + +#### Accessing elements of long vectors + +Long vectors support the existing vector subscript operators `[]` to access the scalar element values. +They do not support any swizzle operations. + +#### Operations on long vectors + +Support all HLSL intrinsics that perform [elementwise calculations](NNNN-dxil-vectors.md#elementwise-intrinsics) + that take parameters that could be long vectors and whose function doesn't limit them to shorter vectors. +These are operations that perform the same operation on an element regardless of its position in the vector + except that the position indicates which element(s) of other vector parameters might be used in that calculation. + +Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions). + +#### Allowed elementwise vector intrinsics + +* Trigonometry : acos, asin, atan, atan2, cos, cosh, degrees, radians, sin, sinh, tan, tanh +* Math: abs, ceil, clamp, exp, exp2, floor, fma, fmod, frac, frexp, ldexp, lerp, log, log10, log2, mad, max, min, pow, rcp, round, rsqrt, sign, smoothstep, sqrt, step, trunc +* Float Ops: f16tof32, f32tof16, isfinite, isinf, isnan, modf, saturate +* Bitwise Ops: reversebits, countbits, firstbithigh, firstbitlow +* Logic Ops: and, or, select +* Reductions: all, any, clamp, dot +* Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal +* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst +* Wave Reductions: WaveActiveAllEqual, WaveMatch +* Type Conversions: asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16 + +#### Disallowed vector intrinsics + +* Only applicable to shorter vectors: AddUint64, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex +* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex + +### Interchange Format Additions + +Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors. +Representation of native vectors in DXIL depends on [dxil vectors](NNNN-dxil-vectors.md). + +### Debug Support + +First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. +These should enable tracking vectors through their scalarized and native vector usages. + +### Diagnostic Changes + +Error messages should be produced for use of long vectors in unsupported interfaces: + +* Typed buffer element types. +* Parameters to the entry function. +* Return types from the entry function. +* Cbuffers blocks. +* Cbuffers global variables. +* Tbuffers. +* Work graph records. +* Mesh/amplification payload entry parameter structures. +* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in + `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics. + +Errors should also be produced when long vectors are used as parameters to intrinsics + with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics) +Attempting to use any swizzle member-style accessors on long vectors should produce an error. +Declaring vectors of length longer than 1024 should produce an error. + +### Validation Changes + +Validation should produce errors when a long vector is found in: + +* The shader signature. +* A cbuffer/tbuffer. +* Work graph records. +* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in + `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics. +* Metadata + +Note that the disallowing long vectors in entry function signatures includes any user-defined structs + used in mesh and ray tracing shaders. + +Use of long vectors in unsupported intrinsics should produce validation errors. + +### Device Capability + +Devices that support Shader Model 6.9 will be required to fully support this feature. + +## Testing + +### Compilation Testing + +#### Correct output testing + +Verify that long vectors can be declared in all appropriate contexts: + +* Local variables. +* Static global variables. +* Non-entry parameters. +* Non-entry return types. +* StructuredBuffer elements. +* Templated Load/Store methods on ByteAddressBuffers. +* As members of arrays and structs in any of the above contexts. + +Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors). + +Verify that long vectors in supported intrinsics produce appropriate outputs. +Supported intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics) + may produce intrinsic calls with native vector parameters where available + or scalarized parameters with individual scalar calls to the corresponding interchange format intrinsics. + +Verify that long vector elements can be accessed using the subscript operation with static or dynamic indices. + +Verify that long vectors of different sizes will reference different overloads of user and built-in functions. +Verify that template instantiation using long vectors correctly creates variants for the right sizes. + +Verification of correct interchange format output depends on the implementation and representation. +Native vector DXIL intrinsics might be checked for as described in [Dxil vectors](NNNN-dxil-vectors.md) + if native DXIL vector output is supported. +SPIR-V equivalent output should be checked as well. +Scalarized representations are also possible depending on the compilation implementation. + +#### Invalid usage testing + +Verify that long vectors produce compilation errors when: + +* Declared in interfaces listed in [Diagnostic changes](diagnostic-changes). +* Passed as parameters to any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics) +* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`) +* Declaring a vector over the maximum size in any of the allowed contexts listed in [Allowed usage](allowed-usage). + +### Validation Testing + +Verify that long vectors produce validation errors in: + +* Each element of the shader signature. +* A cbuffer block struct. +* Work graphs record structs. +* The mesh/amplification entry `Payload` parameter struct. +* Each of the `Payload`, `Parameter`, `Attributes` parameter structs used in + `TraceRay()`, `CallShader()`, and `ReportHit()`, + and `anyhit`, `closesthit`, `miss`, `callable`, and `closesthit` entry functions. +* Any DXIL intrinsic that corresponds to the HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics). +* Any metadata type. + +### Execution Testing + +Correct behavior for all of the intrinsics listed in [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics) + will be verified with execution tests that perform the operations on long vectors and confirm correct results + for the given test values. +Where possible, these tests will be variations on existing tests for these intrinsics. + +## Alternatives considered + +The original proposal introduced an opaque type to HLSL that could represent longer vectors. +This would have been used only for native vector operations. +This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some. + +Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations. +This direction also aligns with the long term roadmap of HLSL to enable generic vectors. +Since the new data type would have required extensive testing as well, +the testing burden saved may not have been substantial. +Since these vectors are to be added eventually anyway, the testing serves multiple purposes. +It makes sense to not introduce a new datatype but use HLSL vectors, +even if the initial implementation only exposes partial functionality. + +The restrictions outlined in [Allowed Usage](allowed-usage) were chosen because they weren't + needed for the targeted applications, but are not inherently impossible. +They omitted out of unclear utility and to simplify the design. +There's nothing about those use cases that is inherently incompatible with long vectors + and future work might consider relaxing those restrictions. + +Swizzle operations were not supported because they are limited to the first four elements. +The names of the accessors (xyzw or rgba) are named according to the expected content of + those vectors in a graphics context. +Since that intretation does not apply to longer vectors, it could be confusing. +The subscript access is flexible and generic and makes other accessors redundant. + +## Open Issues + +* Q: Is there a limit on the Number of Components in a vector? + * A: 128. It's big enough for some known uses. +There aren't concrete reasons to restrict the vector length. +Having a limit facilitates testing and sets expectations for both hardware and software developers. + +* Q: Usage restrictions + * A: Long vectors may not form part of the shader signature. + There are many restrictions on signature elements including bit fields that determine if they are fully written. + By definition, these involve more interfaces that would require additional changes and testing. +* Q: Does this have implications for existing HLSL source code compatibility? + * A: Existing HLSL code that makes no use of long vectors will have no semantic changes. +* Q: Should this change the default N = 4 for vectors? + * A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged. +* Q: How will SPIR-V be supported? + * A: TBD +* Q: should swizzle accessors be allowed for long vectors? + * A: No. It doesn't make sense since they can't be used to access all elements + and there's no way to create enough swizzle members to accommodate the longest allowed vector. +* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors. + * A: After some consideration, we opted not to include explicit Load/Store operations for this function. + There are at least a couple ways this could be resolved, and the preferred solution is outside the scope. + + \ No newline at end of file diff --git a/proposals/0026-hlsl-vector-type.md b/proposals/0026-hlsl-vector-type.md deleted file mode 100644 index 7f52585b..00000000 --- a/proposals/0026-hlsl-vector-type.md +++ /dev/null @@ -1,146 +0,0 @@ - - -* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md) -* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra) -* Sponsor: [Damyan Pepper](https://github.com/damyanp) -* Status: **Under Consideration** - -# HLSL Vectors - -## Introduction - -HLSL has supported vectors in a limited capacity (int3, float4, etc.), and these are scalarized in DXIL; small vectors while useful in a traditional graphics context do not scale well with the evolution on HLSL as a more general purpose language targetting Graphics and Compute. Notably, with the ubiquitous adoption of machine learning techniques which often get expressed as vector-matrix operations, there is a need for supporting larger vector sizes in HLSL and preserving these vector objects at the DXIL level to take advantage of specialized hardware that can accelerate vector operations. - -## Proposed solution - -Enable vectors of longer length in HLSL and preserve the vector type in DXIL. - -## Detailed design - -### HLSL vectors `vector` - -Currently HLSL allows `vector name;` where `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type and `N`, number of -components, is a positive integer less than or equal to 4. See current definition [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). -This proposal extends this support to longer vectors (beyond 4). - -The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N` -defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples -[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). - -The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave -uniformity requirements, but implementations may specify best practices in certain uses for optimal performance. - -**Restrictions on the uses of vectors with N > 4** - -* Vectors with length greater than 4 are not permitted inside a `struct`. -* Vectors with length greater than 4 are not permitted as shader input/output parameters. - -**Constructing vectors** - -HLSL vectors can be constructed through initializer lists and constructor syntax initializing or by assignment. - -Examples: - -``` -vector vecA = {1, 2, 3, 4, 5}; -vector vecB = vector(6, 7, 8, 9, 0, 0); -uint4 initval = {0, 0, 0, 0}; -vector vecC = {uint2(coord.xy), vecB}; -vector vecD = vecB; -``` - -**Load and Store vectors from Buffers/Arrays** - -For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending -the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods. - -``` -// Load/Store from [RW]ByteAddressBuffers -RWByteAddressBuffer myBuffer; - -vector val = myBuffer.LoadN(uint StartOffsetInBytes); -myBuffer.StoreN(uint StartoffsetInBytes, vector stVec); - -// Load/Store from groupshared arrays -groupshared T inputArray[512]; -groupshared T outputArray[512]; - -Load(vector ldVec, groupshared inputArray, uint offsetInBytes); -Store(vector stVec, groupshared outputArray, uint offsetInBytes); -``` - -**Operations on vectors** - -Support all HLSL intrinsics that are important as activation functions: fma, exp, log, tanh, atan, min, max, clamp, and -step. Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors. - -Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions). - -Note: Additionally any mathematical operations missing from the above list but needed as activation functions for neural -network computations will be added. - -### Debug Support -First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths. - - -### Diagnostic Changes - -* Additional error messages for illegal or unsupported use of arbitrary length vectors. -* Remove current bound checks (N <= 4) for vector size in supported cases, both HLSL and DXIL. - - -### Validation Changes - -* What additional validation failures does this introduce? -*Illegal uses of vectors should produce errors* -* What existing validation failures does this remove? -*Allow legal uses of vectors with number of components greater than 4* - -## D3D12 API Additions - -TODO: Possible checks for DXIL vector support and tiered support. - -## Check Feature Support - -Open Issue: Can implementations support vector DXIL? - -### Minimum Support Set - - -## Testing - -* How will correct codegen for DXIL/SPIRV be tested? -* How will the diagnostics be tested? -* How will validation errors be tested? -* How will validation of new DXIL elements be tested? -* A: *unit tests in dxc* -* How will the execution results be tested? -* A: *HLK tests* - - -## Alternatives considered - -Our original proposal introduced an opaque Cooperative Vector type to HLSL to limit the scope of the feature to small -neural network evaluation and also contain the scope for testing. But aligning with the long term roadmap of HLSL to -enable generic vectors, it makes sense to not introduce a new datatype but use HLSL vectors, even if the initial -implementation only exposes partial functionality. - -## Open Issues -* Q: Is there a limit on the Number of Components in a vector? -* A: Chose a number based on precedents set by other languages. Support atleast 128. -* Q: Usage restrictions -* A: *General vectors (N > 4) are not permitted inside structs.* -* Q: Does this have implications for existing HLSL source code compatibility? -* A: *No, existing HLSL code is unaffected by this change.* -* A: *Change the default N = 4 for vectors? Will affect existing shaders.* -* Q: How will SPIRV be supported? -* A: -* Q: When do HLSL vectors remain as vectors and when do they get scalarized in DXIL? -* A: -* Q: Can all implementations support vector DXIL? -* A: Feature check? - -## Acknowledgments - - - \ No newline at end of file diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md new file mode 100644 index 00000000..c19cd933 --- /dev/null +++ b/proposals/NNNN-dxil-vectors.md @@ -0,0 +1,190 @@ + + +# DXIL Vectors + +--- + +* Proposal: [NNNN](NNNN-dxil-vectors.md) +* Author(s): [Greg Roth](https://github.com/pow2clk) +* Sponsor: [Greg Roth](https://github.com/pow2clk) +* Status: **Under Consideration** +* Planned Version: Shader Model 6.9 + +## Introduction + +While DXIL is intended and able to support language vectors, + those vectors must be broken up into individual scalars to be valid DXIL. +This feature introduces the ability to represent native vectors in DXIL for some uses. + +## Motivation + +Although many GPUs support vector operations, DXIL has been unable to directly leverage those capabilities. +Instead, it has scalarized all vector operations, losing their original representation. +To restore those vector representations, platforms have had to rely on auto-vectorization to + rematerialize vectors late in the compilation. +Scalarization is a trivial compiler transformation that never fails, + but auto-vectorization is a notoriously difficult compiler optimization that frequently generates sub-optimal code. +Allowing DXIL to retain vectors as they appeared in source allows hardware that can utilize + vector optimizations to do so more easily without penalizing hardware that requires scalarization. + +Native vector support can also help with the size of compiled DXIL programs. +Vector operations can express in a single instruction operations that would have taken N instructions in scalar DXIL. +This may allow reduced file sizes for compiled DXIL programs that utilize vectors. + +DXIL is based on LLVM 3.7 which already supports native vectors. +These could only be used to a limited degree in DXIL library targets, and never for DXIL operations. +This innate support is expected to make adding them a relatively low impact change to DXIL tools. + +## Proposed solution + +Native vectors are allowed in DXIL version 1.9 or greater. +These can be stored in allocas, static globals, groupshared variables, and SSA values. +They can be loaded from or stored to raw buffers and used as arguments to a selection + of element-wise intrinsic functions as well as the standard math operators. +They cannot be used in shader signatures, constant buffers, typed buffer, or texture types. + +## Detailed design + +### Vectors in memory representations + +In their alloca and variable representations, vectors in DXIL will always be represented as vectors. +Previously individual vectors would get scalarized into scalar arrays and arrays of vectors would be flattened + into a one-dimensional scalar array with indexing to reflect the original intents. +Individual vectors will now be represented as a single native vector and arrays of vectors will remain + as arrays of native vectors, though multi-dimensional arrays will still be flattened to one dimension. + +Single-element vectors are generally not valid in DXIL. +At the language level, they may be supported for corresponding intrinsic overloads, + but such vectors should be represented as scalars in the final DXIL output. +Since they only contain a single scalar, single-element vectors are + informationally equivalent to actual scalars. +Rather than include conversions to and from scalars and single-element vectors, + it is cleaner and functionally equivalent to represent these as scalars in DXIL. +The exception is in exported library functions, which need to maintain vector representations + to correctly match overloads when linking. + +### Changes to DXIL Intrinsics + +A new form of rawBufferLoad allows loading of full vectors instead of four scalars. +The status integer for tiled resource access is loaded just as before. +The returned vector value and the status indicator are grouped into a new `ResRet` helper structure type + that the load intrinsic returns. + +```asm + ; overloads: SM6.9: f16|f32|i16|i32 + ; returns: status, vector + declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferVectorLoad.v[NUM][TY]( + i32, ; opcode + %dx.types.Handle, ; resource handle + i32, ; coordinate c0 (byteOffset) + i32, ; coordinate c1 (elementOffset) + i32) ; alignment +``` + + +The return struct contains a single vector and a single integer representing mapped tile status. + +```asm + %dx.types.ResRet.v[NUM][TY] = type { vector, i32 } +``` + +Here and hereafter, `NUM` is the number of elements in the loaded vector, `TYPE` is the element type name, + and `TY` is the corresponding abbreviated type name (e.g. `i64`, `f32`). + +#### Vector access + +Dynamic access to vectors were previously converted to array accesses. +Native vectors can be accessed using `extractelement`, `insertelement`, or `getelementptr` operations. +Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dynamic index parameters. + +#### Elementwise intrinsics + +A selection of elementwise intrinsics are given additional native vector forms. +Elementwise intrinsics are those that perform their calculations irrespective of the location of the element + in the vector or matrix arguments except insofar as that position corresponds to those of the other elements + that might be used in the individual element calculations. +An elementwise intrinsic `foo` that takes scalar or vector arguments could theoretically implement its vector version using a simple loop and the scalar intrinsic variant. + +```c++ +vector foo(vector a, vector b) { + vector ret; + for (int i = 0; i < NUM; i++) + ret[i] = foo(a[i], b[i]); +} +``` + +For example, `fma` is an elementwise intrinsic because it multiplies or adds each element of its argument vectors, + but `cross` is not because it performs an operation on the vectors as units, + pulling elements from different locations as the operation requires. + +The elementwise intrinsics that have native vector variants represent the + unary, binary, and tertiary generic operations: + +```asm + <[NUM] x [TYPE]> @dx.op.unary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1) + <[NUM] x [TYPE]> @dx.op.binary.v[NUM][[TY]](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2) + <[NUM] x [TYPE]> @dx.op.tertiary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2, <[NUM] x [TYPE]> operand3) +``` + +The scalarized variants of these DXIL intrinsics will remain unchanged and can be used in conjunction + with the vector variants. +This means that the same language-level vector could be used in scalarized operations and native vector operations + within the same shader by being scalarized as needed even within the same shader. + +### Validation Changes + +Blanket validation errors for use of native vectors DXIL are removed. +Specific disallowed usages of native vector types will be determined by + examining arguments to operations and intrinsics and producing errors where appropriate. +Aggregate types will be recursed into to identify any native vector components. + +Native vectors should produce validation errors when: + +* Used in cbuffers. +* Used in unsupported intrinsics or operations as before, but made more specific to the operations. +* Any usage in previous shader model shaders apart from exported library functions. + +Error should be produced for any representation of a single element vector outside of + exported library functions. + +Specific errors might be generated for invalid overloads of `LoadInput` and `StoreOutput` + as they represent usage of vectors in entry point signatures. + +### Device Capability + +Devices that support Shader Model 6.9 will be required to fully support this feature. + +## Testing + +### Compilation Testing + +A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces + in their native form and generate native calls for supported intrinsics. + +Test that appropriate output is produced for: + +* Supported intrinsics and operations will retain vector types. +* Dynamic indexing of vectors produces the correct `extractelement`, `insertelement` + operations with dynamic index parameters. + +### Validation testing + +The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses. +It should produce errors for uses in unsupported intrinsics, cbuffers, and typed buffers. + +Single-element vectors are allowed only as interfaces to library shaders. +Other usages of a single element vector should produce a validation error. + +### Execution testing + +Full runtime execution should be tested by using the native vector intrinsics using + groupshared and non-groupshared memory. +Calculations should produce the correct results in all cases for a range of vector sizes. +In practice, this testing will largely represent verifying correct intrinsic output + with the new shader model. + +## Acknowledgments + +* [Anupama Chandrasekhar](https://github.com/anupamachandra) and [Tex Riddell](https://github.com/tex3d) for foundational contributions to the design. + +