Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework hlsl-vector-type into two specs #361

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
164 changes: 101 additions & 63 deletions proposals/0026-hlsl-long-vector-type.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,27 +9,28 @@

## Introduction

HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.).
HLSL has previously supported vectors of as many as four elements (int3, float4, etc.).
These are useful in a traditional graphics context for representation and manipulation of
pow2clk marked this conversation as resolved.
Show resolved Hide resolved
geometry and color information.
The evolution of HLSL as a more general purpose language targeting Graphics and Compute
greatly benefit from longer vectors to fully represent these operations rather than to try to
break them down into smaller constituent vectors.
This feature adds the ability to declare and use native HLSL vectors longer than four elements.
This feature adds the ability to load, store, and perform select operations on HLSL vectors longer than four elements.

## Motivation

The adoption of machine learning techniques expressed as vector-matrix operations
require larger vector sizes to be representable in HLSL.
To take advantage of specialized hardware that can accelerate vector operations,
these and other vector objects need to be preserved at the DXIL level.
To take advantage of specialized hardware that can accelerate longer vector operations,
these vectors need to be preserved in the exchange format as well.

## Proposed solution

Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the range is inclusive, then that 4 should be 5.

Such vectors will hereafter be referred to as "long vectors".
These will be supported for all elementwise intrinsics that take variable-length vector parameters.
For certain operations, these vectors will be represented as native vectors using [dxil vectors](NNNN-dxil-vectors.md).
For certain operations, these vectors will be represented as native vectors using
[Dxil vectors](NNNN-dxil-vectors.md) and equivalent SPIR-V representations.

## Detailed design

Expand All @@ -54,29 +55,52 @@ Declarations of long vectors require the use of the template declaration.
Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
the element type and number of elements (e.g. float2, double4) are allowed for long vectors.

#### Allowed Usage

The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed for the long vector spec? It's probably implicitly assumed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you take to be implicitly assumed? I don't think support across all shader stages should be assumed. The control/wave uniformity requirements touch more on the followups and could probably be removed. I'm inclined to leave the implementations encouraging best practices language as some platforms might end up scalarizing vector operations at some point or another which won't lead to the best performance.


Long vectors can be:

* Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
* Parameters and return types of non-entry functions.
* Stored in groupshared memory.
* Static global varaibles.
pow2clk marked this conversation as resolved.
Show resolved Hide resolved

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the type of a local (function scoped) variable can have long vector type. That's not mentioned. The examples below show cases like these.

Long vectors are not permitted in:

* Resource types other than ByteAddressBuffer or StructuredBuffer.
* Any element of the shader's signature including entry function parameters and return types.
* Any part of the shader's signature including entry function parameters and return types.
* Cbuffers or tbuffers.
damyanp marked this conversation as resolved.
Show resolved Hide resolved
* A mesh/amplification `Payload` entry parameter structure.
* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structure.
* A work graph record.

#### Constructing vectors

HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
Vectors can be initialized and assigned from various casting operations including scalars and arrays.
Long vectors will maintain equivalent casting abilities.

Examples:

``` hlsl
vector<uint, 5> vecA = {1, 2, 3, 4, 5};
vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0);
```hlsl
vector<uint, 5> InitList = {1, 2, 3, 4, 5};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if not all elements are listed explicitly. Are the rest zero-initialized?

vector<uint, 6> Construct = vector<uint, 6>(6, 7, 8, 9, 0, 0);
uint4 initval = {0, 0, 0, 0};
vector<uint, 8> vecC = {uint2(coord.xy), vecB};
vector<uint, 6> vecD = vecB;
vector<uint, 8> VecVec = {uint2(coord.xy), vecB};
vector<uint, 6> Assigned = vecB;
float arr[5];
vector<float, 5> CastArr = (vector<float, 5>)arr;
vector<float, 6> ArrScal = {arr, 7.9};
vector<float, 10> ArrArr = {arr, arr};
vector<float, 15> Scal = 4.2;
```

float4 main(uint size: S) : SV_Target {
return (float4)arr;
vector<uint, 8> vecC = {uint2(coord.xy), vecB};

pow2clk marked this conversation as resolved.
Show resolved Hide resolved
#### Vectors in Raw Buffers

N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
Expand Down Expand Up @@ -105,8 +129,9 @@ myBuffer.Store(elementIndex, val);

#### Accessing elements of long vectors

Long vectors support the existing vector subscript operators to return the scalar element values.
They do not support swizzle operations as they are limited to only the first four elements.
Long vectors support the existing vector subscript operators `[]` to access the scalar element values.
They do not support any swizzle operations.
Swizzle operations are limited to the first four elements and the accessors are named according to the graphics domain.
damyanp marked this conversation as resolved.
Show resolved Hide resolved

#### Operations on long vectors

Expand All @@ -129,61 +154,60 @@ Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.micro
* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
* Wave Reductions: WaveActiveAllEqual, WaveMatch

#### Native vector intrinsics

Of the above list, the following will produce the appropriate unary, binary, or tertiary
DXIL intrinsic that take native vector parameters:

* fma
* exp
* log
* tanh
* atan
* min
* max
* clamp
* step

#### Disallowed vector intrinsics

* Only applicable to for shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
* Only applicable to shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex

### Interchange Format Additions

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be informative.

A good SPIR-V representation could be as arrays of elements. So it's still a single SSA value.

Representation of native vectors in DXIL depends on [dxil vectors](NNNN-dxil-vectors.md).

### Debug Support

First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths.
First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience.
These should enable tracking vectors through their scalarized and native vector usages.

### Diagnostic Changes

Error messages should be produced for use of long vectors in unsupported interfaces.

* The shader signature.
* A cbuffer/tbuffer.
* A work graph record.
* A mesh or ray tracing payload.
* Typed buffer element types.
* Parameters to the entry function.
* Return types from the entry function.
* Cbuffers blocks.
* Cbuffers global variables.
* Tbuffers.
* Work graph records.
* Mesh/amplification payload entry parameter structures.
* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.

Errors should also be produced when long vectors are used as parameters to intrinsics
with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
Attempting to use any swizzle member-style accessors on long vectors should produce an error.
Declaring vectors of length longer than 128 should produce an error.
Declaring vectors of length longer than 1024 should produce an error.

### Validation Changes

Validation should produce errors when a long vector is found in:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DXIL validation should be left to the DXIL spec, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find it useful to divide the specs along interchange format and language lines. The DXIL spec would have included elements of native vectors and also of long vectors while the language spec would have contained only part of the long vectors description. As such, I created two shader model features that can, but don't have to include language changes. This one includes language and DXIL changes. "The DXIL spec" only has DXIL changes, but its characteristic feature is that it documents native DXIL vectors.


* The shader signature.
* A cbuffer/tbuffer.
* A work graph record.
* A mesh or ray tracing payload.
* Work graph records.
* Mesh/amplification payload entry parameter structures.
* Ray tracing `Payload` parameter structures used in `TraceRay` and `anyhit`/`closesthit`/`miss` entry functions.
* Ray tracing `Parameter` parameter structures used in `CallShader` and `callable` entry functions.
* Ray tracing `Attributes` parameter structures used in `ReportHit` and `closesthit` entry functions.
* Metadata

Use of long vectors in unsupported intrinsics should produce validation errors.

## Runtime Additions

Support for Long vectors requires dxil vector support as defined in [the specification](NNNN-dxil-vectors.md).
### Device Capability

Use of long vectors in a shader should be indicated in DXIL with the corresponding
shader model version and shader feature flag.
Devices that support Shader Model 6.9 will be required to fully support this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interchange format section above we have:

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Representation of native vectors in DXIL depends on dxil vectors.

This seems to imply that long vectors can be supported in existing shader models. I think it's only native DXIL vectors feature that actually requires SM 6.9?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't what I intended to imply with that statement, rather that implementations could choose either approach as we intend to do at least temporarily. However, it is true that this is possible. One of the major remaining questions is if we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I resolve this conversation: do we have something written down that is tracking this remaining question?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #371


## Testing

Expand All @@ -193,38 +217,46 @@ Use of long vectors in a shader should be indicated in DXIL with the correspondi

Verify that long vectors can be declared in all appropriate contexts:

* local variables
* non-entry parameters
* non-entry return types
* StructuredBuffer elements
* Templated Load/Store methods on ByteAddressBuffers
* As members of arrays and structs in any of the above contexts
* Local variables.
* Static global variables.
* Non-entry parameters.
* Non-entry return types.
* StructuredBuffer elements.
* Templated Load/Store methods on ByteAddressBuffers.
* As members of arrays and structs in any of the above contexts.

Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors).

Verify that long vectors in supported intrinsics produce appropriate outputs.
For the intrinsic functions listed in [Native vector intrinsics](#native-vector-intrinsics),
the generated DXIL intrinsic calls will have long vector parameters.
For other elementwise vector intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics),
the generated DXIL should scalarize the parameters and produce scalar calls to the corresponding DXIL intrinsics.
Supported intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
may produce intrinsic calls with native vector parameters where available
or scalarized parameters with individual scalar calls to the corresponding interchange format intrinsics.

Verify that long vector elements can be accessed using the subscript operation with static or dynamic indices.

Verify that long vectors of different sizes will reference different overloads of user and built-in functions.
Verify that template instantiation using long vectors correctly creates variants for the right sizes.

Verify that long vector elements can be accessed using the subscript operation.
Verification of correct interchange format output depends on the implementation and representation.
Native vector DXIL intrinsics might be checked for as described in [Dxil vectors](NNNN-dxil-vectors.md)
if native DXIL vector output is supported.
SPIR-V equivalent output should be checked as well.
Scalarized representations are also possible depending on the compilation implementation.

#### Invalid usage testing

Verify that compilation errors are produced for long vectors used in:
Verify that long vectors produce compilation errors when:

* Entry function parameters
* Entry function returns
* Type buffer declarations
* Cbuffer blocks
* Cbuffer global variables
* Work graph records
* Mesh and ray tracing payloads
* Any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
* Declared in interfaces listed in [Diagnostic changes](diagnostic-changes).
* Passed as parameters to any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`)
* Declaring a vector over the maximum size in any of the allowed contexts listed in [Allowed usage](allowed-usage).

### Validation Testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DXIL validation belongs in the DXIL spec only.


Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
Verify that long vectors produce validation errors when:

* Verify that Validation produces errors for any DXIL intrinsic that corresponds to the
HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
Verify that Validation produces errors for any DXIL intrinsic with native vector parameters
that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
Expand All @@ -251,6 +283,12 @@ Since these vectors are to be added eventually anyway, the testing serves multip
It makes sense to not introduce a new datatype but use HLSL vectors,
even if the initial implementation only exposes partial functionality.

The restrictions outlined in [Allowed Usage](allowed-usage) were chosen because they weren't
needed for the targeted applications, but are not inherently impossible.
They omitted out of unclear utility and to simplify the design.
There's nothing about those use cases that is inherently incompatible with long vectors
and future work might consider relaxing those restrictions.

## Open Issues

* Q: Is there a limit on the Number of Components in a vector?
Expand Down
Loading