diff --git a/proposals/0026-hlsl-long-vector-type.md b/proposals/0026-hlsl-long-vector-type.md
new file mode 100644
index 00000000..75423baa
--- /dev/null
+++ b/proposals/0026-hlsl-long-vector-type.md
@@ -0,0 +1,325 @@
+<!-- {% raw %} -->
+
+# HLSL Long Vectors
+
+* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md)
+* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra), [Greg Roth](https://github.com/pow2clk)
+* Sponsor: [Greg Roth](https://github.com/pow2clk)
+* Status: **Under Consideration**
+
+## Introduction
+
+HLSL has previously supported vectors of as many as four elements (int3, float4, etc.).
+These are useful in a traditional graphics context for representation and manipulation of
+ geometry and color information.
+The evolution of HLSL as a more general purpose language targeting Graphics and Compute
+ greatly benefit from longer vectors to fully represent these operations rather than to try to
+ break them down into smaller constituent vectors.
+This feature adds the ability to load, store, and perform elementwise operations on HLSL
+ vectors longer than four elements.
+
+## Motivation
+
+The adoption of machine learning techniques expressed as vector-matrix operations
+ require larger vector sizes to be representable in HLSL.
+To take advantage of specialized hardware that can accelerate longer vector operations,
+ these vectors need to be preserved in the exchange format as well.
+
+## Proposed solution
+
+Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
+Such vectors will hereafter be referred to as "long vectors".
+These will be supported for all elementwise intrinsics that take variable-length vector parameters.
+For certain operations, these vectors will be represented as native vectors using
+ [Dxil vectors](NNNN-dxil-vectors.md) and equivalent SPIR-V representations.
+
+## Detailed design
+
+### HLSL vectors
+
+Currently HLSL allows declaring vectors using a templated representation:
+
+```hlsl
+vector<T, N> name;
+```
+
+`T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type.
+`N` is the number of components and must be an integer between 1 and 4 inclusive.
+See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
+This proposal adds support for long vectors of length greater than 4 by
+ allowing `N` to be greater than 4 where previously such a declaration would produce an error.
+
+The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
+defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
+[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector).
+Declarations of long vectors require the use of the template declaration.
+Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
+ the element type and number of elements (e.g. float2, double4) are allowed for long vectors.
+
+#### Allowed Usage
+
+The new vectors will be supported in all shader stages including Node shaders.
+
+Long vectors can be:
+
+* Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
+* Parameters and return types of non-entry functions.
+* Stored in groupshared memory.
+* Static global variables.
+
+Long vectors are not permitted in:
+
+* Resource types other than ByteAddressBuffer or StructuredBuffer.
+* Any part of the shader's signature including entry function parameters and return types or
+  user-defined struct parameters.
+* Cbuffers or tbuffers.
+* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structures.
+* A work graph record.
+
+While this describes where long vecgtors can be used and later sections will describe how,
+implementations may specify best practices in certain uses for optimal performance.
+
+#### Constructing vectors
+
+HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
+Vectors can be initialized and assigned from various casting operations including scalars and arrays.
+Long vectors will maintain equivalent casting abilities.
+
+Examples:
+
+```hlsl
+vector<uint, 5> InitList = {1, 2, 3, 4, 5};
+vector<uint, 6> Construct = vector<uint, 6>(6, 7, 8, 9, 0, 0);
+uint4 initval = {0, 0, 0, 0};
+vector<uint, 8> VecVec = {uint2(coord.xy), vecB};
+vector<uint, 6> Assigned = vecB;
+float arr[5];
+vector<float, 5> CastArr = (vector<float, 5>)arr;
+vector<float, 6> ArrScal = {arr, 7.9};
+vector<float, 10> ArrArr = {arr, arr};
+vector<float, 15> Scal = 4.2;
+```
+
+#### Vectors in Raw Buffers
+
+N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
+with a vector type of the required size as the template parameter and byte offset parameters.
+
+```hlsl
+RWByteAddressBuffer myBuffer;
+
+vector<T, N> val = myBuffer.Load< vector<T, N> >(StartOffsetInBytes); 
+myBuffer.Store< vector<T, N> >(StartoffsetInBytes + 100, val);
+
+```
+
+StructuredBuffers with N-element vectors are declared using the template syntax
+ with a long vector type as the template parameter.
+N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
+with the element index parameters.
+
+```hlsl
+RWStructuredBuffer< vector<T, N> > myBuffer;
+
+vector<T, N> val = myBuffer.Load(elementIndex); 
+myBuffer.Store(elementIndex, val);
+
+```
+
+#### Accessing elements of long vectors
+
+Long vectors support the existing vector subscript operators `[]` to access the scalar element values.
+They do not support any swizzle operations.
+
+#### Operations on long vectors
+
+Support all HLSL intrinsics that perform [elementwise calculations](NNNN-dxil-vectors.md#elementwise-intrinsics)
+ that take parameters that could be long vectors and whose function doesn't limit them to shorter vectors.
+These are operations that perform the same operation on an element regardless of its position in the vector
+ except that the position indicates which element(s) of other vector parameters might be used in that calculation.
+
+Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
+
+#### Allowed elementwise vector intrinsics
+
+* Trigonometry : acos, asin, atan, atan2, cos, cosh, degrees, radians, sin, sinh, tan, tanh
+* Math: abs, ceil, clamp, exp, exp2, floor, fma, fmod, frac, frexp, ldexp, lerp, log, log10, log2, mad, max, min, pow, rcp, round, rsqrt, sign, smoothstep, sqrt, step, trunc
+* Float Ops: f16tof32, f32tof16, isfinite, isinf, isnan, modf, saturate
+* Bitwise Ops: reversebits, countbits, firstbithigh, firstbitlow
+* Logic Ops: and, or, select
+* Reductions: all, any, clamp, dot
+* Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal
+* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
+* Wave Reductions: WaveActiveAllEqual, WaveMatch
+* Type Conversions: asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16
+
+#### Disallowed vector intrinsics
+
+* Only applicable to shorter vectors: AddUint64, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
+* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex
+
+### Interchange Format Additions
+
+Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
+Representation of native vectors in DXIL depends on [dxil vectors](NNNN-dxil-vectors.md).
+
+### Debug Support
+
+First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience.
+These should enable tracking vectors through their scalarized and native vector usages.
+
+### Diagnostic Changes
+
+Error messages should be produced for use of long vectors in unsupported interfaces:
+
+* Typed buffer element types.
+* Parameters to the entry function.
+* Return types from the entry function.
+* Cbuffers blocks.
+* Cbuffers global variables.
+* Tbuffers.
+* Work graph records.
+* Mesh/amplification payload entry parameter structures.
+* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.
+
+Errors should also be produced when long vectors are used as parameters to intrinsics
+ with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
+Attempting to use any swizzle member-style accessors on long vectors should produce an error.
+Declaring vectors of length longer than 1024 should produce an error.
+
+### Validation Changes
+
+Validation should produce errors when a long vector is found in:
+
+* The shader signature.
+* A cbuffer/tbuffer.
+* Work graph records.
+* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.
+* Metadata
+
+Note that the disallowing long vectors in entry function signatures includes any user-defined structs
+ used in mesh and ray tracing shaders.
+
+Use of long vectors in unsupported intrinsics should produce validation errors.
+
+### Device Capability
+
+Devices that support Shader Model 6.9 will be required to fully support this feature.
+
+## Testing
+
+### Compilation Testing
+
+#### Correct output testing
+
+Verify that long vectors can be declared in all appropriate contexts:
+
+* Local variables.
+* Static global variables.
+* Non-entry parameters.
+* Non-entry return types.
+* StructuredBuffer elements.
+* Templated Load/Store methods on ByteAddressBuffers.
+* As members of arrays and structs in any of the above contexts.
+
+Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors).
+
+Verify that long vectors in supported intrinsics produce appropriate outputs.
+Supported intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
+ may produce intrinsic calls with native vector parameters where available
+ or scalarized parameters with individual scalar calls to the corresponding interchange format intrinsics.
+
+Verify that long vector elements can be accessed using the subscript operation with static or dynamic indices.
+
+Verify that long vectors of different sizes will reference different overloads of user and built-in functions.
+Verify that template instantiation using long vectors correctly creates variants for the right sizes.
+
+Verification of correct interchange format output depends on the implementation and representation.
+Native vector DXIL intrinsics might be checked for as described in [Dxil vectors](NNNN-dxil-vectors.md)
+ if native DXIL vector output is supported.
+SPIR-V equivalent output should be checked as well.
+Scalarized representations are also possible depending on the compilation implementation.
+
+#### Invalid usage testing
+
+Verify that long vectors produce compilation errors when:
+
+* Declared in interfaces listed in [Diagnostic changes](diagnostic-changes).
+* Passed as parameters to any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
+* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`)
+* Declaring a vector over the maximum size in any of the allowed contexts listed in [Allowed usage](allowed-usage).
+
+### Validation Testing
+
+Verify that long vectors produce validation errors in:
+
+* Each element of the shader signature.
+* A cbuffer block struct.
+* Work graphs record structs.
+* The mesh/amplification entry `Payload` parameter struct.
+* Each of the `Payload`, `Parameter`, `Attributes` parameter structs used in
+  `TraceRay()`, `CallShader()`, and `ReportHit()`,
+  and `anyhit`, `closesthit`, `miss`, `callable`, and `closesthit` entry functions.
+* Any DXIL intrinsic that corresponds to the HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
+* Any metadata type.
+
+### Execution Testing
+
+Correct behavior for all of the intrinsics listed in [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
+ will be verified with execution tests that perform the operations on long vectors and confirm correct results
+ for the given test values.
+Where possible, these tests will be variations on existing tests for these intrinsics.
+
+## Alternatives considered
+
+The original proposal introduced an opaque type to HLSL that could represent longer vectors.
+This would have been used only for native vector operations.
+This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some.
+
+Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations.
+This direction also aligns with the long term roadmap of HLSL to enable generic vectors.
+Since the new data type would have required extensive testing as well,
+the testing burden saved may not have been substantial.
+Since these vectors are to be added eventually anyway, the testing serves multiple purposes.
+It makes sense to not introduce a new datatype but use HLSL vectors,
+even if the initial implementation only exposes partial functionality.
+
+The restrictions outlined in [Allowed Usage](allowed-usage) were chosen because they weren't
+ needed for the targeted applications, but are not inherently impossible.
+They omitted out of unclear utility and to simplify the design.
+There's nothing about those use cases that is inherently incompatible with long vectors
+ and future work might consider relaxing those restrictions.
+
+Swizzle operations were not supported because they are limited to the first four elements.
+The names of the accessors (xyzw or rgba) are named according to the expected content of
+ those vectors in a graphics context.
+Since that intretation does not apply to longer vectors, it could be confusing.
+The subscript access is flexible and generic and makes other accessors redundant.
+
+## Open Issues
+
+* Q: Is there a limit on the Number of Components in a vector?
+  * A: 128. It's big enough for some known uses.
+There aren't concrete reasons to restrict the vector length.
+Having a limit facilitates testing and sets expectations for both hardware and software developers.
+
+* Q: Usage restrictions
+  * A: Long vectors may not form part of the shader signature.
+       There are many restrictions on signature elements including bit fields that determine if they are fully written.
+       By definition, these involve more interfaces that would require additional changes and testing.
+* Q: Does this have implications for existing HLSL source code compatibility?
+  * A: Existing HLSL code that makes no use of long vectors will have no semantic changes.
+* Q: Should this change the default N = 4 for vectors?
+  * A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged.
+* Q: How will SPIR-V be supported?
+  * A: TBD
+* Q: should swizzle accessors be allowed for long vectors?
+  * A: No. It doesn't make sense since they can't be used to access all elements
+       and there's no way to create enough swizzle members to accommodate the longest allowed vector.
+* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
+  * A: After some consideration, we opted not to include explicit Load/Store operations for this function.
+       There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
+
+<!-- {% endraw %} -->
\ No newline at end of file
diff --git a/proposals/0026-hlsl-vector-type.md b/proposals/0026-hlsl-vector-type.md
deleted file mode 100644
index 7f52585b..00000000
--- a/proposals/0026-hlsl-vector-type.md
+++ /dev/null
@@ -1,146 +0,0 @@
-<!-- {% raw %} -->
-
-* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md)
-* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra)
-* Sponsor: [Damyan Pepper](https://github.com/damyanp)
-* Status: **Under Consideration**
-
-# HLSL Vectors
-
-## Introduction
-
-HLSL has supported vectors in a limited capacity (int3, float4, etc.), and these are scalarized in DXIL; small vectors while useful in a traditional graphics context do not scale well with the evolution on HLSL as a more general purpose language targetting Graphics and Compute. Notably, with the ubiquitous adoption of machine learning techniques which often get expressed as vector-matrix operations, there is a need for supporting larger vector sizes in HLSL and preserving these vector objects at the DXIL level to take advantage of specialized hardware that can accelerate vector operations.
-
-## Proposed solution
-
-Enable vectors of longer length in HLSL and preserve the vector type in DXIL.
-
-## Detailed design
-
-### HLSL vectors `vector<T, N>`
-
-Currently HLSL allows `vector<T, N> name;` where `T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type and `N`, number of
-components, is a positive integer less than or equal to 4. See current definition [here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector). 
-This proposal extends this support to longer vectors (beyond 4). 
-
-The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
-defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
-[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector).
-
-The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
-uniformity requirements, but implementations may specify best practices in certain uses for optimal performance. 
-
-**Restrictions on the uses of vectors with N > 4** 
-
-* Vectors with length greater than 4 are not permitted inside a `struct`.
-* Vectors with length greater than 4 are not permitted as shader input/output parameters.
-
-**Constructing vectors**
-
-HLSL vectors can be constructed through initializer lists and constructor syntax initializing or by assignment.
-
-Examples:
-
-``` 
-vector<uint, 5> vecA = {1, 2, 3, 4, 5}; 
-vector<uint, 6> vecB = vector<uint, 6>(6, 7, 8, 9, 0, 0);
-uint4 initval = {0, 0, 0, 0};
-vector<uint, 8> vecC = {uint2(coord.xy), vecB};
-vector<uint, 6> vecD = vecB;
-```
-
-**Load and Store vectors from Buffers/Arrays**
-
-For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending
-the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
-
-``` 
-// Load/Store from [RW]ByteAddressBuffers
-RWByteAddressBuffer myBuffer;
-
-vector<uint, N> val = myBuffer.LoadN(uint StartOffsetInBytes); 
-myBuffer.StoreN<T>(uint StartoffsetInBytes, vector<T, N> stVec);
-
-// Load/Store from groupshared arrays
-groupshared T inputArray[512];
-groupshared T outputArray[512];
-
-Load(vector<T,N> ldVec, groupshared inputArray, uint offsetInBytes);
-Store(vector<T,N> stVec, groupshared outputArray, uint offsetInBytes);
-```
-
-**Operations on vectors** 
-
-Support all HLSL intrinsics that are important as activation functions: fma, exp, log, tanh, atan, min, max, clamp, and
-step. Eventually support all HLSL operators and math intrinsics that are currently enabled for vectors.
-
-Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).
-
-Note: Additionally any mathematical operations missing from the above list but needed as activation functions for neural
-network computations will be added.
-
-### Debug Support
-First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience. Open Issue: Handle DXIL scalarized and vector paths. 
-
-
-### Diagnostic Changes
-
-* Additional error messages for illegal or unsupported use of arbitrary length vectors.
-* Remove current bound checks (N <= 4) for vector size in supported cases, both HLSL and DXIL.
-
-
-### Validation Changes
-
-* What additional validation failures does this introduce?
-*Illegal uses of vectors should produce errors*
-* What existing validation failures does this remove?
-*Allow legal uses of vectors with number of components greater than 4*
-
-## D3D12 API Additions
-
-TODO: Possible checks for DXIL vector support and tiered support.
-
-## Check Feature Support
-
-Open Issue: Can implementations support vector DXIL?
-
-### Minimum Support Set
-
-
-## Testing
-
-* How will correct codegen for DXIL/SPIRV be tested?
-* How will the diagnostics be tested?
-* How will validation errors be tested?
-* How will validation of new DXIL elements be tested?
-* A: *unit tests in dxc*
-* How will the execution results be tested?
-* A: *HLK tests*
-
-
-## Alternatives considered
-
-Our original proposal introduced an opaque Cooperative Vector type to HLSL to limit the scope of the feature to small
-neural network evaluation and also contain the scope for testing. But aligning with the long term roadmap of HLSL to
-enable generic vectors, it makes sense to not introduce a new datatype but use HLSL vectors, even if the initial
-implementation only exposes partial functionality.
-
-## Open Issues
-* Q: Is there a limit on the Number of Components in a vector?
-* A: Chose a number based on precedents set by other languages. Support atleast 128.
-* Q: Usage restrictions
-* A: *General vectors (N > 4) are not permitted inside structs.*
-* Q: Does this have implications for existing HLSL source code compatibility?
-* A: *No, existing HLSL code is unaffected by this change.*
-* A: *Change the default N = 4 for vectors? Will affect existing shaders.*
-* Q: How will SPIRV be supported?
-* A: 
-* Q: When do HLSL vectors remain as vectors and when do they get scalarized in DXIL?
-* A: 
-* Q: Can all implementations support vector DXIL?
-* A: Feature check?
-
-## Acknowledgments
-
-
-<!-- {% endraw %} -->
\ No newline at end of file
diff --git a/proposals/NNNN-dxil-vectors.md b/proposals/NNNN-dxil-vectors.md
new file mode 100644
index 00000000..c19cd933
--- /dev/null
+++ b/proposals/NNNN-dxil-vectors.md
@@ -0,0 +1,190 @@
+<!-- {% raw %} -->
+
+# DXIL Vectors
+
+---
+
+* Proposal: [NNNN](NNNN-dxil-vectors.md)
+* Author(s): [Greg Roth](https://github.com/pow2clk)
+* Sponsor: [Greg Roth](https://github.com/pow2clk)
+* Status: **Under Consideration**
+* Planned Version: Shader Model 6.9
+
+## Introduction
+
+While DXIL is intended and able to support language vectors,
+ those vectors must be broken up into individual scalars to be valid DXIL.
+This feature introduces the ability to represent native vectors in DXIL for some uses.
+
+## Motivation
+
+Although many GPUs support vector operations, DXIL has been unable to directly leverage those capabilities.
+Instead, it has scalarized all vector operations, losing their original representation.
+To restore those vector representations, platforms have had to rely on auto-vectorization to
+ rematerialize vectors late in the compilation.
+Scalarization is a trivial compiler transformation that never fails,
+ but auto-vectorization is a notoriously difficult compiler optimization that frequently generates sub-optimal code.
+Allowing DXIL to retain vectors as they appeared in source allows hardware that can utilize
+ vector optimizations to do so more easily without penalizing hardware that requires scalarization.
+
+Native vector support can also help with the size of compiled DXIL programs.
+Vector operations can express in a single instruction operations that would have taken N instructions in scalar DXIL.
+This may allow reduced file sizes for compiled DXIL programs that utilize vectors.
+
+DXIL is based on LLVM 3.7 which already supports native vectors.
+These could only be used to a limited degree in DXIL library targets, and never for DXIL operations.
+This innate support is expected to make adding them a relatively low impact change to DXIL tools.
+
+## Proposed solution
+
+Native vectors are allowed in DXIL version 1.9 or greater.
+These can be stored in allocas, static globals, groupshared variables, and SSA values.
+They can be loaded from or stored to raw buffers and used as arguments to a selection
+ of element-wise intrinsic functions as well as the standard math operators.
+They cannot be used in shader signatures, constant buffers, typed buffer, or texture types.
+
+## Detailed design
+
+### Vectors in memory representations
+
+In their alloca and variable representations, vectors in DXIL will always be represented as vectors.
+Previously individual vectors would get scalarized into scalar arrays and arrays of vectors would be flattened
+ into a one-dimensional scalar array with indexing to reflect the original intents.
+Individual vectors will now be represented as a single native vector and arrays of vectors will remain
+ as arrays of native vectors, though multi-dimensional arrays will still be flattened to one dimension.
+
+Single-element vectors are generally not valid in DXIL.
+At the language level, they may be supported for corresponding intrinsic overloads,
+ but such vectors should be represented as scalars in the final DXIL output.
+Since they only contain a single scalar, single-element vectors are
+ informationally equivalent to actual scalars.
+Rather than include conversions to and from scalars and single-element vectors,
+ it is cleaner and functionally equivalent to represent these as scalars in DXIL.
+The exception is in exported library functions, which need to maintain vector representations
+ to correctly match overloads when linking.
+
+### Changes to DXIL Intrinsics
+
+A new form of rawBufferLoad allows loading of full vectors instead of four scalars.
+The status integer for tiled resource access is loaded just as before.
+The returned vector value and the status indicator are grouped into a new `ResRet` helper structure type
+ that the load intrinsic returns.
+
+```asm
+  ; overloads: SM6.9: f16|f32|i16|i32
+  ; returns: status, vector
+  declare %dx.types.ResRet.v[NUM][TY] @dx.op.rawBufferVectorLoad.v[NUM][TY](
+      i32,                  ; opcode
+      %dx.types.Handle,     ; resource handle
+      i32,                  ; coordinate c0 (byteOffset)
+      i32,                  ; coordinate c1 (elementOffset)
+      i32)                  ; alignment
+```
+
+
+The return struct contains a single vector and a single integer representing mapped tile status.
+
+```asm
+  %dx.types.ResRet.v[NUM][TY] = type { vector<TYPE, NUM>, i32 }
+```
+
+Here and hereafter, `NUM` is the number of elements in the loaded vector, `TYPE` is the element type name,
+ and `TY` is the corresponding abbreviated type name (e.g. `i64`, `f32`).
+
+#### Vector access
+
+Dynamic access to vectors were previously converted to array accesses.
+Native vectors can be accessed using `extractelement`, `insertelement`, or `getelementptr` operations.
+Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dynamic index parameters.
+
+#### Elementwise intrinsics
+
+A selection of elementwise intrinsics are given additional native vector forms.
+Elementwise intrinsics are those that perform their calculations irrespective of the location of the element
+ in the vector or matrix arguments except insofar as that position corresponds to those of the other elements
+ that might be used in the individual element calculations.
+An elementwise intrinsic `foo` that takes scalar or vector arguments could theoretically implement its vector version using a simple loop and the scalar intrinsic variant.
+
+```c++
+vector<TYPE, NUM> foo(vector<TYPE, NUM> a, vector<TYPE, NUM> b) {
+  vector<TYPE, NUM> ret;
+  for (int i = 0; i < NUM; i++)
+    ret[i] = foo(a[i], b[i]);
+}
+```
+  
+For example, `fma` is an elementwise intrinsic because it multiplies or adds each element of its argument vectors,
+ but `cross` is not because it performs an operation on the vectors as units,
+ pulling elements from different locations as the operation requires.
+
+The elementwise intrinsics that have native vector variants represent the
+ unary, binary, and tertiary generic operations:
+
+```asm
+ <[NUM] x [TYPE]> @dx.op.unary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1)
+ <[NUM] x [TYPE]> @dx.op.binary.v[NUM][[TY]](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2)
+ <[NUM] x [TYPE]> @dx.op.tertiary.v[NUM][TY](i32 opcode, <[NUM] x [TYPE]> operand1, <[NUM] x [TYPE]> operand2, <[NUM] x [TYPE]> operand3)
+```
+
+The scalarized variants of these DXIL intrinsics will remain unchanged and can be used in conjunction
+ with the vector variants.
+This means that the same language-level vector could be used in scalarized operations and native vector operations
+ within the same shader by being scalarized as needed even within the same shader.
+
+### Validation Changes
+
+Blanket validation errors for use of native vectors DXIL are removed.
+Specific disallowed usages of native vector types will be determined by
+ examining arguments to operations and intrinsics and producing errors where appropriate.
+Aggregate types will be recursed into to identify any native vector components.
+
+Native vectors should produce validation errors when:
+
+* Used in cbuffers.
+* Used in unsupported intrinsics or operations as before, but made more specific to the operations.
+* Any usage in previous shader model shaders apart from exported library functions.
+
+Error should be produced for any representation of a single element vector outside of
+ exported library functions.
+
+Specific errors might be generated for invalid overloads of `LoadInput` and `StoreOutput`
+ as they represent usage of vectors in entry point signatures.
+
+### Device Capability
+
+Devices that support Shader Model 6.9 will be required to fully support this feature.
+
+## Testing
+
+### Compilation Testing
+
+A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces
+ in their native form and generate native calls for supported intrinsics.
+
+Test that appropriate output is produced for:
+
+* Supported intrinsics and operations will retain vector types.
+* Dynamic indexing of vectors produces the correct `extractelement`, `insertelement`
+ operations with dynamic index parameters.
+
+### Validation testing
+
+The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
+It should produce errors for uses in unsupported intrinsics, cbuffers, and typed buffers.
+
+Single-element vectors are allowed only as interfaces to library shaders.
+Other usages of a single element vector should produce a validation error.
+
+### Execution testing
+
+Full runtime execution should be tested by using the native vector intrinsics using
+ groupshared and non-groupshared memory.
+Calculations should produce the correct results in all cases for a range of vector sizes.
+In practice, this testing will largely represent verifying correct intrinsic output
+ with the new shader model.
+
+## Acknowledgments
+
+* [Anupama Chandrasekhar](https://github.com/anupamachandra) and [Tex Riddell](https://github.com/tex3d) for foundational contributions to the design.
+
+<!-- {% endraw %} -->