From e8da05fe4d86d7434387e613599eac8b6745c957 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 7 Jan 2025 15:15:38 -0800 Subject: [PATCH 1/2] add SPV_INTEL_2d_block_io extension source --- README.md | 1 + .../INTEL/SPV_INTEL_2d_block_io.asciidoc | 753 ++++++++++++++++++ .../images/SPV_INTEL_2d_block_io-diagram.png | Bin 0 -> 31990 bytes 3 files changed, 754 insertions(+) create mode 100644 extensions/INTEL/SPV_INTEL_2d_block_io.asciidoc create mode 100644 extensions/INTEL/images/SPV_INTEL_2d_block_io-diagram.png diff --git a/README.md b/README.md index 49d9d68..59f2f21 100644 --- a/README.md +++ b/README.md @@ -101,6 +101,7 @@ Khronos SPIR-V Registry](https://www.khronos.org/registry/spir-v/). * [SPV_GOOGLE_user_type ]( https://github.khronos.org/SPIRV-Registry/extensions/GOOGLE/SPV_GOOGLE_user_type.html) * [SPV_HUAWEI_cluster_culling_shader ]( https://github.khronos.org/SPIRV-Registry/extensions/HUAWEI/SPV_HUAWEI_cluster_culling_shader.html) * [SPV_HUAWEI_subpass_shading ]( https://github.khronos.org/SPIRV-Registry/extensions/HUAWEI/SPV_HUAWEI_subpass_shading.html) +* [SPV_INTEL_2d_block_io ]( https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_2d_block_io.html) * [SPV_INTEL_arbitrary_precision_fixed_point]( https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_arbitrary_precision_fixed_point.html) * [SPV_INTEL_arbitrary_precision_floating_point]( https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_arbitrary_precision_floating_point.html) * [SPV_INTEL_arbitrary_precision_integers ]( https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_arbitrary_precision_integers.html) diff --git a/extensions/INTEL/SPV_INTEL_2d_block_io.asciidoc b/extensions/INTEL/SPV_INTEL_2d_block_io.asciidoc new file mode 100644 index 0000000..4c0a216 --- /dev/null +++ b/extensions/INTEL/SPV_INTEL_2d_block_io.asciidoc @@ -0,0 +1,753 @@ +:extension_name: SPV_INTEL_2d_block_io +:capability_name: Subgroup2DBlockIOINTEL +:capability_token: 6228 +:capability_transform_name: Subgroup2DBlockTransformINTEL +:capability_transform_token: 6229 +:capability_transpose_name: Subgroup2DBlockTransposeINTEL +:capability_transpose_token: 6230 +:op_load: OpSubgroup2DBlockLoadINTEL +:op_load_token: 6231 +:op_load_transform: OpSubgroup2DBlockLoadTransformINTEL +:op_load_transform_token: 6232 +:op_load_transpose: OpSubgroup2DBlockLoadTransposeINTEL +:op_load_transpose_token: 6233 +:op_prefetch: OpSubgroup2DBlockPrefetchINTEL +:op_prefetch_token: 6234 +:op_store: OpSubgroup2DBlockStoreINTEL +:op_store_token: 6235 + +{extension_name} +================ + +== Name Strings + +{extension_name} + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm + +== Contributors + +// spell-checker: disable +* Ben Ashbaugh, Intel +* Pekka Jääskeläinen, Intel +* Victor Mustya, Intel +* Yury Plyakhin, Intel +// spell-checker: enable + +== Notice + +Copyright (c) 2025 Intel Corporation. All rights reserved. + +== Status + +* Complete + +== Version + +[width="40%",cols="25,25"] +|======================================== +| Last Modified Date | 2025-01-07 +| Revision | 1 +|======================================== + +== Dependencies + +This extension is written against the SPIR-V Specification, +Version 1.6, Revision 4. + +This extension requires SPIR-V 1.0. + +This extension interacts with the *SPV_KHR_untyped_pointers* extension, by accepting untyped pointers as pointer operands. + +This extension interacts with the *SPV_INTEL_cache_controls* extension, by supporting cache control decorations on the pointer operands. + +== Overview + +This extension adds additional subgroup block load and store instructions to read two-dimensional blocks of data from a two-dimensional region of memory, or to write two-dimensional blocks of data to a two dimensional region of memory. +This is an important operation for many machine learning algorithms, which operate on two-dimensional matrix data as part of a matrix multiplication algorithm. + +The block sizes that are supported are device-specific. +A companion client API specification will describe the block sizes that are supported for a device. + +This extension additionally adds support for two pre-processing operations that may be performed when loading a two-dimensional block of data: + +1. The two-dimensional block may be _transposed_ after loading and before it is written to the instruction's destination. +2. The two-dimensional block may be _transformed_ after loading and before it is written to the instruction's destination. +The _transform_ operation converts the two-dimensional block from a _row-major_ layout to a _packed_ layout by combining data elements from multiple block rows into 32-bit values. +This layout is used by some matrix multiplication instructions. + +== Extension Name + +To use this extension within a SPIR-V module, the appropriate *OpExtension* must +be present in the module: + +[subs="attributes"] +---- +OpExtension "{extension_name}" +---- + +== Modifications to the SPIR-V Specification, Version 1.6 + +=== Capabilities + +Modify Section 3.31, Capability, adding rows to the Capability table: +-- +[options="header"] +|==== +2+^| Capability ^| Implicitly Declares +| {capability_token} | *{capability_name}* +| +| {capability_transform_token} | *{capability_transform_name}* +| *{capability_name}* +| {capability_transpose_token} | *{capability_transpose_name}* +| *{capability_name}* +|==== +-- + +=== Instructions + +Modify Section 3.42.21, Group Instructions, adding to the end of the list of instructions: + +[cols="1,1,10*3",width="100%"] +|===== +11+a|[[{op_load}]]*{op_load}* + +Loads one or more 2D blocks of data from a 2D row-major region of memory. +The 2D blocks of data are loaded collectively, as a subgroup operation. + +The _Element Size_ operand specifies the size of one block element, in bytes. +The _Block Width_, _Block Height_, and _Block Count_ operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type. + +The _Block Width_ specifies the number of elements in each block row. +The _Block Height_ specifies the number of rows in each block. +The _Block Count_ specifies the number of blocks to load. +If _Block Count_ is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block. + +_Src Base Pointer_ is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the *CrossWorkgroup* storage class. + +The _Memory Width_, _Memory Height_, and _Memory Pitch_ operands specify the 2D region of memory to load from. +These operands must be integer type scalars. + +The _Memory Width_ specifies the width of the 2D region of memory, in bytes. +The _Memory Height_ specifies the number of rows in the 2D region of memory. +The _Memory Pitch_ specifies the number of bytes between each row in the 2D region of memory. + +The _Coordinate_ operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components. + +The first component of _Coordinate_ specifies the number of elements to skip, from the start of a row. +The second component of _Coordinate_ specifies the number of rows to skip, from the base of the 2D region of memory. + +_Dst Pointer_ is a pointer to per-invocation storage that will hold the results of the 2D block load. +It must be a pointer to the *Function* storage class. + +Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction. + +Behavior is undefined unless _Block Width_, _Block Height_, _Block Count_, _Src Base Pointer_, _Memory Width_, _Memory Height_, _Memory Pitch_, and _Coordinate_ are dynamically uniform for all invocations within the subgroup. + +[NOTE] +==== +Follows the templated function: + +[source] +---- +template +void OpSubgroup2DBlockLoadINTEL( + const T* srcBasePointer, + int memoryWidth, + int memoryHeight, + int memoryPitch, + int2 coordinate, + T* dstPointer); +---- +==== + +|Capability: + +*{capability_name}* +| 11 | {op_load_token} +| __ + +_Element Size_ +| __ + +_Block Width_ +| __ + +_Block Height_ +| __ + +_Block Count_ +| __ + +_Src Base Pointer_ +| __ + +_Memory Width_ +| __ + +_Memory Height_ +| __ + +_Memory Pitch_ +| __ + +_Coordinate_ +| __ + +_Dst Pointer_ +|===== + +[cols="1,1,10*3",width="100%"] +|===== +11+a|[[{op_load_transpose}]]*{op_load_transpose}* + +Loads and transposes one or more 2D blocks of data from a 2D row-major region of memory. +The 2D blocks of data are loaded collectively, as a subgroup operation. + +The _Element Size_ operand specifies the size of one block element, in bytes. +The _Block Width_, _Block Height_, and _Block Count_ operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type. + +The _Block Width_ specifies the number of elements in each block row, pre-transpose. +The _Block Height_ specifies the number of rows in each block, pre-transpose. +The _Block Count_ specifies the number of blocks to load. +If _Block Count_ is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block. + +_Src Base Pointer_ is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the *CrossWorkgroup* storage class. + +The _Memory Width_, _Memory Height_, and _Memory Pitch_ operands specify the 2D region of memory to load from. +These operands must be integer type scalars. + +The _Memory Width_ specifies the width of the 2D region of memory, in bytes. +The _Memory Height_ specifies the number of rows in the 2D region of memory. +The _Memory Pitch_ specifies the number of bytes between each row in the 2D region of memory. + +The _Coordinate_ operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components. + +The first component of _Coordinate_ specifies the number of elements to skip, from the start of a row. +The second component of _Coordinate_ specifies the number of rows to skip, from the base of the 2D region of memory. + +_Dst Pointer_ is a pointer to per-invocation storage that will hold the results of the transposed 2D block load. +It must be a pointer to the *Function* storage class. + +Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction. + +Behavior is undefined unless _Block Width_, _Block Height_, _Block Count_, _Src Base Pointer_, _Memory Width_, _Memory Height_, _Memory Pitch_, and _Coordinate_ are dynamically uniform for all invocations within the subgroup. + +[NOTE] +==== +Follows the templated function: + +[source] +---- +template +void OpSubgroup2DBlockLoadTransposeINTEL( + const T* srcBasePointer, + int memoryWidth, + int memoryHeight, + int memoryPitch, + int2 coordinate, + T* dstPointer); +---- +==== + +|Capability: + +*{capability_transpose_name}* +| 11 | {op_load_transpose_token} +| __ + +_Element Size_ +| __ + +_Block Width_ +| __ + +_Block Height_ +| __ + +_Block Count_ +| __ + +_Src Base Pointer_ +| __ + +_Memory Width_ +| __ + +_Memory Height_ +| __ + +_Memory Pitch_ +| __ + +_Coordinate_ +| __ + +_Dst Pointer_ +|===== + +[cols="1,1,10*3",width="100%"] +|===== +11+a|[[{op_load_transform}]]*{op_load_transform}* + +Loads and transforms one or more 2D blocks of data into a packed format from a 2D row-major region of memory. +The transformation combines elements from multiple rows of the 2D region into packed 32-bit values. +The 2D blocks of data are loaded and transformed collectively, as a subgroup operation. + +The _Element Size_ operand specifies the size of one block element, in bytes. +The _Block Width_, _Block Height_, and _Block Count_ operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type. + +The _Block Width_ specifies the number of elements in each block row. +The _Block Height_ specifies the number of rows in each block. +The _Block Count_ specifies the number of blocks to load. +If _Block Count_ is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block. + +_Src Base Pointer_ is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the *CrossWorkgroup* storage class. + +The _Memory Width_, _Memory Height_, and _Memory Pitch_ operands specify the 2D region of memory to load from. +These operands must be integer type scalars. + +The _Memory Width_ specifies the width of the 2D region of memory, in bytes. +The _Memory Height_ specifies the number of rows in the 2D region of memory. +The _Memory Pitch_ specifies the number of bytes between each row in the 2D region of memory. + +The _Coordinate_ operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components. + +The first component of _Coordinate_ specifies the number of elements to skip, from the start of a row. +The second component of _Coordinate_ specifies the number of rows to skip, from the base of the 2D region of memory. + +_Dst Pointer_ is a pointer to per-invocation storage that will hold the results of the transformed 2D block load. +It must be a pointer to the *Function* storage class. +If it is an *OpTypePointer* pointer, it must point to a scalar 32-bit integer type. + +Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction. + +Behavior is undefined unless _Block Width_, _Block Height_, _Block Count_, _Src Base Pointer_, _Memory Width_, _Memory Height_, _Memory Pitch_, and _Coordinate_ are dynamically uniform for all invocations within the subgroup. + +[NOTE] +==== +Follows the templated function: + +[source] +---- +template +void OpSubgroup2DBlockLoadTransformINTEL( + const T* srcBasePointer, + int memoryWidth, + int memoryHeight, + int memoryPitch, + int2 coordinate, + uint* dstPointer); +---- +==== + +|Capability: + +*{capability_transform_name}* +| 11 | {op_load_transform_token} +| __ + +_Element Size_ +| __ + +_Block Width_ +| __ + +_Block Height_ +| __ + +_Block Count_ +| __ + +_Src Base Pointer_ +| __ + +_Memory Width_ +| __ + +_Memory Height_ +| __ + +_Memory Pitch_ +| __ + +_Coordinate_ +| __ + +_Dst Pointer_ +|===== + +[cols="1,1,9*3",width="100%"] +|===== +10+a|[[{op_prefetch}]]*{op_prefetch}* + +Prefetches one or more blocks of data from a 2D row-major region of memory into a cache. +Prefetching does not affect the functionality of a module but may change its performance characteristics. +The 2D blocks of data are prefetched collectively, as a subgroup operation. + +The _Element Size_ operand specifies the size of one block element, in bytes. +The _Block Width_, _Block Height_, and _Block Count_ operands specify the total number of elements to prefetch. +These operands must be constant instructions with scalar 32-bit integer type. + +The _Block Width_ specifies the number of elements in each block row. +The _Block Height_ specifies the number of rows in each block. +The _Block Count_ specifies the number of blocks to prefetch. +If _Block Count_ is greater than one, the blocks are prefetched in row-major order, with the next block beginning immediately after the previous block. + +_Src Base Pointer_ is a pointer to the base of the 2D region of memory to prefetch from. +It must be a pointer to the *CrossWorkgroup* storage class. + +The _Memory Width_, _Memory Height_, and _Memory Pitch_ operands specify the 2D region of memory to prefetch. +These operands must be integer type scalars. + +The _Memory Width_ specifies the width of the 2D region of memory, in bytes. +The _Memory Height_ specifies the number of rows in the 2D region of memory. +The _Memory Pitch_ specifies the number of bytes between each row in the 2D region of memory. + +The _Coordinate_ operand specifies the starting location in the 2D region of memory to prefetch from. +It must be a vector of two integer type components. + +The first component of _Coordinate_ specifies the number of elements to skip, from the start of a row. +The second component of _Coordinate_ specifies the number of rows to skip, from the base of the 2D region of memory. + +Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction. + +Behavior is undefined unless _Block Width_, _Block Height_, _Block Count_, _Src Base Pointer_, _Memory Width_, _Memory Height_, _Memory Pitch_, and _Coordinate_ are dynamically uniform for all invocations within the subgroup. + +[NOTE] +==== +Follows the templated function: + +[source] +---- +template +void OpSubgroup2DBlockPrefetchINTEL( + const T* srcBasePointer, + int memoryWidth, + int memoryHeight, + int memoryPitch, + int2 coordinate); +---- +==== + +|Capability: + +*{capability_name}* +| 10 | {op_prefetch_token} +| __ + +_Element Size_ +| __ + +_Block Width_ +| __ + +_Block Height_ +| __ + +_Block Count_ +| __ + +_Src Pointer_ +| __ + +_Memory Width_ +| __ + +_Memory Height_ +| __ + +_Memory Pitch_ +| __ + +_Coordinate_ +|===== + +[cols="1,1,10*3",width="100%"] +|===== +11+a|[[{op_store}]]*{op_store}* + +Stores one or more 2D blocks of data to a 2D region of memory. +The 2D blocks of data are stored collectively, as a subgroup operation. + +The _Element Size_ operand specifies the size of one block element, in bytes. +The _Block Width_, _Block Height_, and _Block Count_ operands specify the total number of elements to store. +These operands must be constant instructions with scalar 32-bit integer type. + +The _Block Width_ specifies the number of elements in each block row. +The _Block Height_ specifies the number of rows in each block. +The _Block Count_ specifies the number of blocks to store. +If _Block Count_ is greater than one, the blocks are stored in row-major order, with the next block beginning immediately after the previous block. + +_Src Pointer_ is a pointer to per-invocation storage that holds the data to store. +It must be a pointer to the *Function* storage class. + +_Dst Base Pointer_ is a pointer to the base of the 2D region of memory to store to. +It must be a pointer to the *CrossWorkgroup* storage class. + +The _Memory Width_, _Memory Height_, and _Memory Pitch_ operands specify the 2D region of memory to store to. +These operands must be integer type scalars. + +The _Memory Width_ specifies the width of the 2D region of memory, in bytes. +The _Memory Height_ specifies the number of rows in the 2D region of memory. +The _Memory Pitch_ specifies the number of bytes between each row in the 2D region of memory. + +The _Coordinate_ operand specifies the starting location in the 2D region of memory to store to. +It must be a vector of two integer type components. + +The first component of _Coordinate_ specifies the number of elements to skip, from the start of a row. +The second component of _Coordinate_ specifies the number of rows to skip, from the base of the 2D region of memory. + +Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction. + +Behavior is undefined unless _Block Width_, _Block Height_, _Block Count_, _Src Base Pointer_, _Memory Width_, _Memory Height_, _Memory Pitch_, and _Coordinate_ are dynamically uniform for all invocations within the subgroup. + +[NOTE] +==== +Follows the templated function: + +[source] +---- +template +void OpSubgroup2DBlockStoreINTEL( + const T* srcPointer, + T* dstBasePointer, + int memoryWidth, + int memoryHeight, + int memoryPitch, + int2 coordinate); +---- +==== + +|Capability: + +*{capability_name}* +| 11 | {op_store_token} +| __ + +_Element Size_ +| __ + +_Block Width_ +| __ + +_Block Height_ +| __ + +_Block Count_ +| __ + +_Src Pointer_ +| __ + +_Dst Base Pointer_ +| __ + +_Memory Width_ +| __ + +_Memory Height_ +| __ + +_Memory Pitch_ +| __ + +_Coordinate_ +|===== + +== Diagram + +The diagram below shows the meaning of the 2D block load and store operands. + +image::images/SPV_INTEL_2d_block_io-diagram.png[align="center"] + +== Mapping Block Data to Invocations + +This section describes the mapping between the 2D block of data that is loaded or stored and the invocations in the subgroup. + +First, the _Block Width_ and _Block Height_ are padded, if necessary. +For *{op_load}*, *{op_load_transform}*, and *{op_store}*, the _Block Width_ is padded to the next power-of-two. +For *{op_load_transpose}*, the _Block Height_ is padded to the next power-of-two. +For *{op_load_transform}*, the _Block Height_ is padded to a multiple of four for 1-byte elements, and a multiple of two for 2-byte elements. +For loads, the value of any padded elements is zero. +For stores, the value of any padded elements is ignored. + +For *{op_load_transform}*, the loaded block data is then transformed, by combining elements from multiple rows of a single column of the 2D region and packing them into 32-bit values. +For 2-byte elements, every two rows are combined into a 32-bit value, with the lower-numbered rows in the lower bits and the higher-numbered rows in the higher bits. +For 1-byte elements, every four rows are are combined into a 32-bit value, with the lower-numbered rows in the lower bits and the higher-numbered rows in the higher bits. +This packed layout is sometimes referred to as a _VNNI_ layout. + +For *{op_load_transpose}*, the loaded block data is then transposed, by assigning the first column of the 2D block to the first row of the transposed 2D block, and so on. + +Next, the rows of the 2D block are assigned to invocations in the subgroup. +Because the padded block width and the subgroup size are both powers of two, there are three scenarios to consider: + +1. If the padded block width is equal to the subgroup size, each invocation is assigned one element of the block row. +2. If the padded block width is less than the subgroup size, multiple rows are assigned to the subgroup. +The first row is assigned to the first set of invocations, then the next row is assigned to the next set of invocations, and so on. +3. If the padded block width is greater than the subgroup size, multiple elements of each block row are assigned to each invocation. +The first set of elements are assigned to the first invocation, then the next set of elements are assigned to the next invocation, and so on. + +In all cases, the lower numbered columns are assigned to the lower numbered invocations. + +=== Examples + +1. Loading a two row by four column block of elements (_Block Width_ equals four, _Block Height_ equals two), with a subgroup size of four, using *{op_load}*: ++ +-- +Block data: + +[cols="4*^", width="1%"] +|===== +| `0,0` | `0,1` | `0,2` | `0,3` +| `1,0` | `1,1` | `1,2` | `1,3` +|===== + +This is the case where the padded block width is equal to the subgroup size. In this case, each invocation is assigned one element of the block row. Therefore, because there are two rows: + +* Invocation 0 is assigned the values `0,0` and `1,0`. +* Invocation 1 is assigned the values `0,1` and `1,1`. +* Invocation 2 is assigned the values `0,2` and `1,2`. +* Invocation 3 is assigned the values `0,3` and `1,3`. +-- + +2. Loading a four row by two column block of elements (_Block Width_ equals two, _Block Height_ equals four), with a subgroup size of four, using *{op_load}*: ++ +-- +Block data: + +[cols="2*^", width="1%"] +|===== +| `0,0` | `0,1` +| `1,0` | `1,1` +| `2,0` | `2,1` +| `3,0` | `3,1` +|===== + +This is the case where the padded block width is less than the subgroup size. In this case, the first row is assigned to Invocation 0 and Invocation 1, and the second row is assigned to Invocation 2 and Invocation 3, and so on. Therefore: + +* Invocation 0 is assigned the values `0,0` and `2,0`. +* Invocation 1 is assigned the values `0,1` and `2,1`. +* Invocation 2 is assigned the values `1,0` and `3,0`. +* Invocation 3 is assigned the values `1,1` and `3,1`. +-- + +3. Loading a two row by eight column block of elements (_Block Width_ equals eight, _Block Height_ equals two), with a subgroup size of four, using *{op_load}*: ++ +-- +Block data: + +[cols="8*^", width="1%"] +|===== +| `0,0` | `0,1` | `0,2` | `0,3` | `0,4` | `0,5` | `0,6` | `0,7` +| `1,0` | `1,1` | `1,2` | `1,3` | `1,4` | `1,5` | `1,6` | `1,7` +|===== + +This is the case where the padded block width is greater than the subgroup size. In this case, the first set of elements of each block row is assigned to Invocation 0, the next set of elements are assigned to Invocation 1, and so on. Therefore: + +* Invocation 0 is assigned the values `0,0`, `0,1`, `1,0`, and `1,1`. +* Invocation 1 is assigned the values `0,2`, `0,3`, `1,2`, and `1,3`. +* Invocation 2 is assigned the values `0,4`, `0,5`, `1,4`, and `1,5`. +* Invocation 3 is assigned the values `0,6`, `0,7`, `1,6`, and `1,7`. +-- + +4. Loading a four row by two column block of elements (_Block Width_ equals two, _Block Height_ equals four), with a subgroup size of four, using *{op_load_transpose}*: ++ +-- +Block data (pre-transpose): + +[cols="2*^", width="1%"] +|===== +| `0,0` | `0,1` +| `1,0` | `1,1` +| `2,0` | `2,1` +| `3,0` | `3,1` +|===== + +After transposition, this is the same as the first example, so: + +* Invocation 0 is assigned the values `0,0` and `0,1`. +* Invocation 1 is assigned the values `1,0` and `1,1`. +* Invocation 2 is assigned the values `2,0` and `2,1`. +* Invocation 3 is assigned the values `3,0` and `3,1`. +-- + +5. Loading a two row by four column block of two-byte elements (_Block Width_ equals four, _Block Height_ equals two), with a subgroup size of four, using *{op_load_transform}*: ++ +-- +Block data: + +[cols="4*^", width="1%"] +|===== +| `0,0` | `0,1` | `0,2` | `0,3` +| `1,0` | `1,1` | `1,2` | `1,3` +|===== + +For two-byte elements, the transform operation combines every two rows together to form a 32-bit value. Therefore: + +* Invocation 0 is assigned the 32-bit value `1,0 | 0,0`. +* Invocation 1 is assigned the 32-bit value `1,1 | 0,1`. +* Invocation 2 is assigned the 32-bit value `1,2 | 0,2`. +* Invocation 3 is assigned the 32-bit value `1,3 | 0,3`. +-- + +6. Loading a four row by four column block of one-byte elements (_Block Width_ equals four, _Block Height_ equals two), with a subgroup size of four, using *{op_load_transform}*: ++ +-- +Block data: + +[cols="4*^", width="1%"] +|===== +| `0,0` | `0,1` | `0,2` | `0,3` +| `1,0` | `1,1` | `1,2` | `1,3` +| `2,0` | `2,1` | `2,2` | `2,3` +| `3,0` | `3,1` | `3,2` | `3,3` +|===== + +For one-byte elements, the transform operation combines every four rows together to form a 32-bit value. Therefore: + +* Invocation 0 is assigned the 32-bit value `3,0 | 2,0 | 1,0 | 0,0`. +* Invocation 1 is assigned the 32-bit value `3,1 | 2,1 | 1,1 | 0,1`. +* Invocation 2 is assigned the 32-bit value `3,2 | 2,2 | 1,2 | 0,2`. +* Invocation 3 is assigned the 32-bit value `3,3 | 2,3 | 1,3 | 0,3`. +-- + +== Out-of-Bounds Behavior + +If some or all of the 2D block is out-of-bounds, where the bounds are defined by the _Memory Width_ and _Memory Height_, the behavior is as follows: + +* For loads, any out-of-bounds elements are assigned the value zero. +* For prefetches and stores, any out-of-bounds elements are ignored. + +== Restrictions + +The following restrictions apply to the 2D block load, store and prefetch instructions added by this extension: + +* The _Element Size_ must be 1, 2, 4, or 8 bytes. +* The _Block Width_ must be a multiple of four for 1-byte elements, or a multiple of two for 2-byte elements. +* Behavior is undefined unless: + ** the first component of _Coordinate_ is a multiple of four for 1-byte elements, or a multiple of two for 2-byte elements. + ** the per-subgroup source or destination base address is cache-line aligned (64 bytes). + ** the per-invocation source or destination address is aligned to a multiple of the _Element Size_. + ** the _Memory Width_ is greater than or equal to 64 bytes and less than or equal to 2^24^ bytes. + ** the _Memory Height_ is greater than zero and less than or equal to 2^24^ rows. + ** the _Memory Pitch_ is greater than or equal to the _Memory Width_ and a multiple of 8 bytes. + ** the *SubgroupMaxSize* is a power of two. + ** the *SubgroupSize* is equal to the *SubgroupMaxSize*; in other words, this is a full subgroup. + +== Issues + +. How should this functionality work with untyped pointers (AKA opaque pointers)? ++ +-- +*RESOLVED*: Added an _Element Size_ operand to explicitly specify the amount of data to load or store vs. inferring the element size from typed pointers. +Note, this extension does not currently includes optional _Memory Operands_ to specify pointer alignment, because the pointer must already be aligned due to hardware restrictions.. +-- + +. Can we use a 32-bit integer-type scalar to represent the memory width, height, and pitch, or should we allow for 64-bit integers for very large matrices? ++ +-- +*RESOLVED*: We will use 32-bit integer-type scalars to represent the block width, height, and count, but we will allow for 64-bit integers to represent the memory width, height, and pitch, and for the block start coordinates. + +The client API environment specs will restrict all of these operands to 32-bit integers initially, however. +-- + +. Terminology-wise, should we use "width" and "height", or "rows" and "columns"? ++ +-- +*RESOLVED*: We will use "width" and "height" to describe both the block dimensions and the memory dimensions. +-- + +. Terminology-wise, how should we describe the coordinate to read? ++ +-- +*RESOLVED*: The operand will simply be described as a vector coordinate. +This avoids needing to describe "X" or "Y" or "Row" or "Column" in the operand names. +The first coordinate will be the "X" or "Column" coordinate, and the second coordinate will be the "Y" or "Row" coordinate. +-- + +. Terminology-wise, should we use "load" and "store", or "read" and "write"? ++ +-- +*RESOLVED*: We will use "load" and "store" for consistency with the rest of the SPIR-V specification. +-- + +. What should the behavior be if some or all of the 2D block is out-of-bounds? ++ +-- +*RESOLVED*: The behavior is well-defined. +Specifically, out-of-bounds reads are assigned the value zero, and out-of-bounds prefetches and stores are ignored. +-- + + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2025-01-07|Ben Ashbaugh|Initial revision for publication +|======================================== diff --git a/extensions/INTEL/images/SPV_INTEL_2d_block_io-diagram.png b/extensions/INTEL/images/SPV_INTEL_2d_block_io-diagram.png new file mode 100644 index 0000000000000000000000000000000000000000..e8855a6b7f7c00986f60fde9789aa3de60b79a77 GIT binary patch literal 31990 zcmbrm2UJsA+b+65EU2Ixwp;0J5m35RDH0HoUZjeYU;(5{?+FTa1rZ3Hpi&H7BOo0` zMOr|5N2LY`2qA3|CIkf=Q4&P14-7JYh})NKJ8tP%uV$;4)Pv^Ac(_2 z|Dpv1u|-1=EB3%X@IQZT;$^@KOMr#m1*nw3PXxcPxoMkdL(tp!LtEGOg5TM1>)QoD z5LY1lk7dGDC=`MspBP-!whDD3je9>!?E&`?-$?luvz~_+wD*;RSw)+3OdR9Wx(` z3a64}|2p1G{r;0$i2Uvm6z`hr?(W)2wvsOCFB-gBUlWnvzo@c9jnY#_<*rnVbcOHkhD0&3SMrhvK$33TJ7x7!3k9p z;YHQk^u>%6B&r$@-xlX{Upb;LUAnZ-^w3{Yr&%uDeOh@!K!C(>P$1Q=G<2%YW%kz5 z7mBhah!xPU$SEni%QG zP${#k_IGh3Z4`a%I#ZBoF>CpA=tjIWH$0fCgl9tYQof?eva9R8W-5n{%Sg5ND|Bh} zMlJ_nJtxBEvRt>fXr1IP#ALPGtl?gBr#L0g0YyE2zl&}pG2B-5dXG^CW2Lci%&E%9 zkv%3qk&b#s#=zH$?+&(wM1{d%Bij56?~M3hqc+@^;cI!>g<$p~*sp>&Me|&Hpmt}% zyJ}p`h}j<(*NZeK8M!TwRp$*`N<5^uYNj{_+Yti_mJ?^ZDJn^~^BH~~*zdDn4R{ZFW#Hk;uxQFg}aQ~Fe#dg(fM z(slqw+TP~EGp`I+D|d?lKDC?J^)rE_ib2fiu(;*)Id&2HUNt-P0orj7Xzz2x=t8yzR-;U#F`P60{Kgg4GB%mSlE1FjhX=7otkv~)$h2fAxS8TA z+iuKM+l+UqE|84TYvSl0yV0aU^_Jo&o_=rtZ1K$77gW2nQRIxtB-9fR?sH(c&ijVX z#f@bVi3Ql@Vp?~)lBY5uO`%|{(WAeRFmj~i%G+0CD(LO4XYdUx9-pop^^v4@W~lg1 z4Gmp!BRyD5k#QWcBfl(ek7`emT`3;ES=Rz4T0mRWljE|>#XOVWzUdEZc6mR~45);^ z>GtX}$J)vVbBXh%FJ=w5c zBs_02r*^rf$SY-`U^Ts*Vt9`ri>9?ptGy z-mRnVso5vc7f8KKPKb}p?ZEr`PO7q}ef|2?@IZo==sVvF&tgUEUk6)PbY(A-Cor-0 zVNaLO#9z)pj7O~0&jk08HAwG!+v;oyk|tID*=2QWhHj)v(GLq8@2_{Ar%I7qHL#)c zFCQ3|lV@bn)p%DUJ)x+;v?CtBg8DA&1!2JWqSIV2u$bS!V;0(dUUbXA+Gv)D$X6*h z$kYE6Z$&vJY*_Fy&Ewc*1nzL#^EGy(o)A-#oSL;VQ^IKi41v&-Pk6yFu~fQX$BMP0 zm`Ewib0X2{gc zznX>TU{!?rQB&tc(;V(ha8~QCL^I9d>Z(SoI*pwza%Wqzq{@D6oW3e!3E%Fwc49nB zYllAAC_|1lCrVXMDh?Z_>Ff9Kg_hV?l`1AK_^|dne(x>2>ZXR%zM>YrOjUD%%g>1W zgG($=@C3D@4(@@zSqGf50aC%eG+FElElaROF$C9mQh76rd1KtO-Y~yH ze~XI5{A=8#bLUtrLa7w^<-Yv-qJwZ?V;jT#+f%zGvSk)m>4g zflXs8+o&uJ@hgVq3FYqQw^x}f=th2p*UlU|q?>92I@CmR8!b;{9wl)7h)!eqA{o0A z$=;0ur6_Q~>kMN$s4a=g34bH;B+h6HTH-?X2-Ph8TL0rzP!H>3Ilx33v$=LrE!OG% zt;=%*n1$cdgN`ANX(2^f#)F2K1^Y?5O&S(OC))4Tt9GQe(>R(38@DO)W-b=KdP8B$ zJQomm4CzBW!3b?iMeE{(H{=FtD$t`6)`!CZE7$$dW?-zJadn-Q^Oc$&H6-#{Vu=l$)hhF!fY5ssx z$}0DVT-n!|lX7(71xiI(R7ppGAcf!KH0JjiYjYku(uil7beZqCSUY?9+Q?kP^5AhW z)Kpq!E$iuP+8458h1gZYi|m)dp2k&ovCz~a`GwqRfeQsa1xQ^}iH9i0=p^RnOu&}f zb?Z-AR{6O?Ic)m%T&HI={5v#kvfi1gB{a;Yd#c%U>Sdb|q~lWpO5%}7Z0024fHjx= zj8d1UZP!mGb#i5_7T4;)k^9PrdswzCDx~q!?$mhd)%O$JIw5^a<|K1DBttK3FEn!F zf%=_&(0sD?Q4jp8)rRRs`tdQ}MrNC)`>T!(12T<1Ld{qk+1Ri@;jWx*xQOr4tgU8l zjH-J+5J!l-e*JnTbL%`8bl(i%n8W>|DGnCt`o8U1Arr&JV08sNxn@9K8S(vjpMy`$ zGs6|GGtA@pXK2?R2n^WO8nC|B{`dzZr*I$|ihik19SdkFS^M@ajl1N}>(_gZUvnZ) zoMCdinm;`56#9}z2x;gY4)2ZrgZ+nm^re*_Jic|Sm)PUv_d~8?02#?y1>L@@NaHyb z^T(-B{k;d9lRrxCR`7L+JwCZa0jL`VK$i0xkD=RNdEDUX{IMI{E956^Sro}LtAgn= z%66gxkzw_2I+$KYEbvKbOY!QhUxD{0dOL1`sp{Iuh{*FGW(E+>jP~Eix3k_L9xv?E z^zBKKm`fxrGz>Qg-slc^#E%%w#m@@(fvO&YNuN!&bs|bdZHD)kmu`qBUA7Y;*O)Y- z(mgES)|f=0zaH8t)F@bvs-n>MK#3v%=-roK(MlJ|6UspyUq9|}TRUwsXwcC`Q-8zp z!|iMAPyFkQ*`N6{a$-1ntI+AbXvU~ZMvd=AJMC2WH$S{@{mye8PWRa^Y~vSZJy*}o z46E*o=dm?rcb~3Gc=nu<8SQUK=-Kih_T;(H(A!-*cC+Zl=AJ6^!uqw5KIU5bY!shtN9$q&=NoLqa&;%8f6(78P@eX?e8#nhrpRcF2O6_7q<~bX4(DN|;Z`Bzn}l%3E{J zXV`V3F3bh%(vC?cm-N@2#Kw*Q9fgc;S7)?*|aG~i<~i8UTyiHbVEfC@v6uW1Azo#K}VPe4PO_y*Q_ zTtT)+Y0W~R1qKCg;USYzA#;L|b78LK&m%JP8DN7dSf_I1o*VS*!@GLkaF;NG{yn}; zL*y|f*gW&#>+q%hLXqY8U?W_gfs%oQcOihr_zwIez^elef!}rQ$It_IZpmTGD?S(s zDBDR3NmjVNS#Rr&t{T;;^@yY?=GGiSwq*rXXGQ62OF=Q1n=J?#GPjw=+m0&R!Yq%1 zv9R%)u4@s}$cDNO@|dMTe~C@Bu2U^4J94eHvn+(t>5Ia5ki#KJ-DhCGmY!hP&46YO z6~zcrg)KgqbXEH(bhs6<)r?4TK4o&3>-O}{WbEIPHpREVUW_H!FkP_cXw!-9b+#P3 zs0}HIXDk;S!UP8476x;8;xS>o}bKF)oU!d(i+zmkw!hswXsf~e;tpR z>*ZUss(%EA@v&6>*b69l4@QwKMYMxb-V6?Wg>`72G(Fp3(J<0|(Cm(!n^)@$Mv}fJZ-sE|W z9?npq6lFPF>$8*K5>01u-L1UgkTRtMtXPl&}?ZA~#&6Qo`ux+sK8T zjXiVSf$ss`!i3TJd8gmC%SbYhzk#e}uZ8{P~EpB8!0FZ>4eaSwFAz~&$a z?~LT%t)K24G72ca)*cf50T7-xYK#-Rw}+5si$I7&n%O1RvZsV&VNMma!5q%=kbKl+ z1lN3jbZOPftA*7TA=!Zb3sCtYIKfNx18vB+Tx_Iz>cq^G9N`1t3>$Wd(FL`>nfgE0 zV5A~7c-u9V>+`AK`flAp=O_|${H?idDCdp}E3 zHfHzn*HrpOO#Vp|Dbth4ug#;2cl}*vSK`A81#7Z&2kReIR93F6K6$$HBbzoPr$J=6 z&a@@F&|8;BM*btQ;oU$CQUKPf5+rhj`v4<8Y*vN2rCo{L!i`{;Y zZ8wd;}eN~)L(?uMV zegSvDkQi_m`%SJ2ccAU)n#5fwx^+`%X9}G}n?vl-!0tQr2W!rmjqmZO_tlG~we)@3 zI8H7$7HHHTfaH{X8#U8?Q2t}+mcGZ)bpVgNIq&vdG(M_$Hs}p=BH<#G)7j(auV$_Z zu(}ZgKq(+-xL%)*MGKdu{X411Rw3H*DJK@tV>3Y1&b2UN~Hvly>{00n8UNR@481?%r-ig*PW}-MpEbd<;lfka<00xz^?S z0=W6J`>fgW2G?URD0JSGyz`WGUsGVY^T)fhpAN}-nX1kt&4Oj@E7P#~%iYd#ZK0?P z6Ts}++3Iqoj)$xi?~Qg88AR8=j^^;6PSC@&VgJ;Xwflk`bf;;zL&=Hr|+ z%;Zc8-3`I8!-rEAI=l8m`FL>tG9cW97;Y+9JE3SZv#{Dj#wA_0_RoennHvXW>|1@Q zgW^;A$2~-|D3t#9+LLa+Oi+mwo(Ug_PCIG+VUj*jrB;>y6>z&rjg6rjE}H$O=P7-z zw6&Q|d*AwgyFE~E{13M|Zm3gk{G)+4Z?f<$LZ+&{B6r3auojHn+%1xGX3Lv77mI1; zg&~}()6aWOMZmg_`xT9?-_0(L8zhY3E5-JjwKmC&dxwr=g3{bZ-Lk{-#|URFKPCpH zS%@ah{uuY)QY06S-ehp_DF-flr;Eg$d)0vr{q;zGe>CybU^H<-=Qp0r9_{*uZj|`@ zl+J8kVzC0@oDQh>mh%9mOOp2-(J1ez1#m1@l^t0MsQ&B=BbhUQcfpGaNMpW5u2&GW zRUqJyVEsqlYh zn-R{A3YKOuH##L12Ba$G{n}-X+u@RTb2N@aL)L5ef}XP}j9b)OO5Qisq)qqu1*UuI zvS!9ARfO3f*h4J7TV@3JyN}ldWyE0lZ+v-tERTNr+U0aKtd5DEUa;=k3k4@T47x8U zMIDH~n!it*N3~e}fMJgscT7$^YW;PVC^ST~u&X`K)t6g6h}a<`Qbqw2jyJM44@ovf z;7%G^mQLW*f3*6iKNt__Ixotm29zg5tky6mt$bM14MbhL$57m@HDMQy+?`$)SS_rarzb}we(6p4Si1gVTLX0bJL>(d8 zH$6jzw-Zy@0P*JezSP&ftajg&aycD44ltkSf$-HY$E$cPGzF~oKvt|dn2g85^MysF z9;Rx? zVNRh8V(puV%TBDOyE5h@f5Up7Y_gpABCI&|q;!KJ*j4Tzw&QnR$6mCK6Xv&i-`l&eohgKcdCpHydRB%&X5i(uNN2GGl?gCV64D(@$n; zb68|UPOy=FpTJR3EqZMuqZZ8dTa8iF@v3T|OB+3CPfTvh0f1ix?6$#foxfey%d7-} zK2@s53c13UrQ&onX#tCBUU#|pXZ?pga`w49SzBkPzv~H+iiax|f6^y+7-$tV#TMP~ z;NL15rAOq&#G1OZ+Aug+p#v;ObsL3Gi_SCE&;+`ocyAR@m~{)Ixmcg6JdLPYKa$iC z-b3v&L(_Xp?mrd6C|Wb7#Sz0A0Q}mov$H|M(Dg(0+TFr1fQCyYJoqZbqQ}IETnU!$ z4Bv(`TP#rJN$6D>e26^UVCedOSlU;JfgYCR4i8?|6X-Xd))R?3yuK(7tNW>%Y+7ET zKLHjFanYwGeW7xC!@7pC5UX&dT-pNGqxFSoC)np3y^pc7-vi=XR;Q@Uy2?nJj$gmE ztqFY(`Q-ya30Q=eJI@|HBqf60-8c?xs^vWztjs^WeuwK7)>25g&8`gg%Oj>vaqn?;j2~a{vAv{K?%s zLw--1%DeQJpL06)pU zcrDuwX=QNGITbVBK-X_sK`h3y%cDL2ot_dMZ0Bzso_<8|nHKGbO&lXe;QfqC_+lT)#E5_~vz7dLL59t@qXmsKWsq z#Zy8eTFjDZS;16)eoz>?_10>~c;m@$FJ6v4(5-@2s~s!v9nM_Vhc7R)u$3>C09_vc zK5wMUY`dpF9=tza{g@U>>DV!?n1(%3KvI)Ih!CmelW#}21oWZL-G}MA#~_xaa7w_? zAOyJZ@Ov<&pXJNuKtDm`II@^uOJ_|MWBe z!3fNMCRtYZ&R9f5BumNo-mU@HoW#4{uBNO!dz{Dl_epx!iPu z_jNRjG4c}29_M_8gI6N@slg`(f2ba3gEZON{0K#yL(|5SmEbX z2OEXb8bA_soEZz~@`ee7>-Sf>SxR8~>wz0|b@8D{+%4F##F!zN>+(X=cMr*&CIf_v zVOGunJ=Ajp9CvmJ-{m6+=M&+G@&xJ&ebd3o`u;*&mM-?n8uS*IG=( z$6BWCfy$M@tS{MsHK|zn#lbS66P!1rE@XVz81Zb6)EM6Gvo!PGHH}w)ed!#j@HDdS zfUV;?pg7OGwJE-Gxw0MrwG7!PqLEAWGlJS;@!FVg^$doP?g%8IiPPE+v%p0IM=MJ{ z)8Aq&0H3Y0OOv6P;^1#!x4lu-I9|m&2kx0HIaYZXrBT*E_h3h#?AMLgq~8hvNCq=% z9q0F5_jt1NX)fb8#w<{3D%jJCjE&*tS`spw=B7-Z?UYPb-6cXKF=A6FfVg>vr$@cp zsf{~K`L5MUYsw4fH_F-(z)WkavnOe(M(yk0j@bYnWDnZ>ZoYPaJ=qjBhYPpdKlKKJKUz{t_37kd35^;0SmRKtIHQS z#so|vjCr^|On<0obs&kKi8@6&RNV76u&~%dHCZ!ljqZ{?YKI0cjra0HIyZLnqN*!v z^|!ighMOii?Okabxg8$#`rOS^3Q$hJ<|gZ2odlsWR4;5JABxUej5)NuruyD%_}!Y_ zd!PpUt1pUQ?@U+X)4O!3P1A1T52F+r>6SDEdiyJ4>*tZIii|#Tx`G?wD@^MKeEi$a z6nIUPQ5%+-;qhq{N`sMXec%+GvZzk=@`nf2fF_MpfKN?R3K?@13)}oc>%0S$wAs## zs#Vzg9>c%Q3?U$>GehGGo?|N-{OL? zQ$_`BaMlouoM-Nl9Q`kP=|cStcPuW;+i~-p@p^7mpBpf*uS)<6Mx74 zz2i4aLK-I?=V>#7+5=!Or*dKObeX|eB)QX`;-0kdhCd`C_v9R6dkSG#VU|p8fGuQy z-@V?f6_m86zaACYuNSK z^0l7`#&DMpCvWhnj$uJFjn#{3;#NF;#B~GlEUQ^FP1?=0mYW+!y+s<0P7j1op@8bJ zH*@iR$@w|v8LDhYyitJctpR=>zJt3aP#xgl3?S(rUW&&FN@{<<*gu!@+^stzOU(6q zgeKHP2mzop`4LVOVs*B`HGk5Z#a%RTYf`rqD6FfBPxryn`wH6X)SqvTub^x%WVJ3= zN&7a?T+@7gle?@8%afnaiU`s4*?oFi1M=A{i3&tOz8Aez{Qy02a>KcAuL|w>k}Li8 z*We6C375V}itH8QQ#F&d9x0j5uRURbg*g*|eh~gjq%I^T$xXW=;OGfBgo|KBSr(g24C@0BovH0;{C!V0DePvW z)jQ*xje+?Z-wU&s;}V2Wn?DIX7(ly)f-v{zc0DUfS+Jp&qSHfenD&Fo!E;_pKx^*2 z(YM|%Y6G>tXL{-5exape7UxRlu1=wQ7j%`AZJL-ud8^o*eR`~9^@)&!rJjf{tCY9oLnHtK#zYSHhumJGP> zUMDPZ?W)sa>^WvGkG;nB=9J5XWENWdsQT1Xh02A-GgX#;e^MxWp<8_L`7J)B#A>l$ z<@%88f9mIjEDP@!arYF~LqA~wH><|aJ%D{-jO*21hTm$2dKb%=ChtWDCWm8OR=!ic z&YMXfB6Qx}d#DY_rd5+o%$`SUH-Rs*rK9S@ipfNu_lZlq4JTm0v~&Pz7g^YZ@gu8F z*TI^hywIQB{88)QXR7D6$XCIg6?4oA8n6?cZeyz2oDT-7R74OlF9q|Q1Il>@uv57J z7b_ENN75hleNQJqqSu;1_{K1_hws$g`}z?Qs^BW{~_Ifh3rz5 zfo)>73&@+B6coWQQyUKNInre8c4Bj*aD^mSD!{P^R)MuV0m1@5|HYdDYp#!QO7G6@ z`O$pz>C>mc1OrZk@^4KjIF7Em|280V35%=O;(f8S}4CImtmFCQOYs1T&C$K>Bo z0vRBWZl!xWqbL6FCP~8gYI&aT-X8z;d-)&Yk8<7)aOMAB;qlrl|97J_&EetA(%Q2W z?m1XWlJX`3!BmEv;uMbSh!6 zTe@UCmRTI&c^x1vd14gg?8XgM*hcw2@FU&&Qx3{WZXjo^7}MK3L7khlV})|0K_TeA z?%tOCIQvmsC9Es~ffDlm8|*z8qhs9{L5v~{oEaLq2(gtPf)QgUs7w>pdz+QMO9=NP zfHeX3+e#SXBx~x-60%0?0Q%0Vu&)KlQ1SPUWiTCp^LXj7%C>}z_gi|~f-%1--;%z( zmbtbyP;A$0*Q+SoxaA-4iCtXL4I5Q?^kLTQW69EaDGtm6cB_sR>egfxc9xvn{Hl^3 zMxIq#sUw@h1~PzM#C}_(H%1oegjoT*HS9Xbwoe(9Cu-gN`RU$nb`%8s8qz>CQ2}HK z*RH_XMRr}7J6QbPYXihox3DPk*b()ma>9tC$cmoWh_Ai1Rv{W(unF|}{W>56O5$08 z7Qm(+A@Bz0$Ew2MuYC>NeR~O;@VvX?FFS$vpD`7WXut=T^KPG-Hu&uac6%idSG)zg zU$E*Zt8=#WYK2F@iKK&RRtGCCJ52^H4kXW^0N4*6Smq5w$e#`9?`j8RsvhrJz6w;8 z9k2#7poi{xPTye1DGUz9oo*^kFc<%63dd$$obuEP_NlTz6#E63gfASMeb=?nUz7pD zVxC2XFEw00<Ii$L+UI8A@LQ+wE%Rf~7leQ$2D@Z!)A5$DZuY*LZ%ghx4F5#S z4Tb_Hqo~P2xM|vATuMVf{#-EbP0 z&e4WGm}cQ<)#dlRb=j72u1%v$-P)|8?A7w^cY;-d3Q>G6*wsGoh@#8|IzOVTo)M{F zwqqPa*Kb=K8?Cay;AN;md_c*d+6FO<>*uWEM;SF0)oByP{Q_)&`+#VS{etb@T>5?} z)pYHFGy1UL-6Jyc&ZpO=`h7udSoC`9HwCUYk}5})g>Jlp8C^bQaWG`S>jjzi<3a_$FW{0h8X$g^57Hg* zP8KCNMH*z{dBV2FQsy$F7Al9!d`6=*VG>5|_YUZwp8!ZIaJZqcDZjMe2@Z|?pPJCDep4Rv;np?H0s!at@vO8O{Iu<(# ziCy36sC83XWE7>}R^1-;h7q9BUXNfva@>JC*unE3Ll;7*!<)<&-~Bqh3~h&ktykWX zo47>iR`{FaTjR&K>pymcaw+#QLoVKZMBk)NMEHUriz*gi@DD@`b76lJYlhD_0}gB8 zgA;inQH@q*cAXlRN#3~ieSe2_ETu5yO_ZUwB*?A#K1WtW<>9yC$Q?Q__0UZaR<5Bn zOzg}rtkPR=49*QQYJw1r6LU=jB`3(LAabGsd{-_E6g9(GEYW??)C-F$a}hfc4;{Fl zLyh@h%FT{iI0dB(aw!Ks;HlSct?aw~N+mU|nC*nBX``?R*Tw1WOwW+p{sh%Iz@@Ou zOp6(-p#56oEkzc?Njq9xoD|FW4EJ{|-Eb(}4xh(sLs_=n5m^k$W-s+evVLe~)1uHg z4la8LCr7(0Z|_%gG(H6t_TJP)n}{X0z9)-Snl%o}b3b)c-mgW6t`}q&3n5_dN`r})z!j@TOC%^eZy z3gwcT@l%M}ni7&->^X8d*&{{U17EzG?5PTAd~~8usEe@CwC70DQP<&jE}g*YeE%h2 z;t!=C;p9#g@Ril&Pt}($U8NP9%s6U#S_&u&THEEP$#ERaV0fzw6hzLou3T?q7~^lS z4kfc`Em_>nAxi$;F}cH>Oql_7+vkb--_}|Q(PDppue>=K%bn1D`k3U@ZnRy4%Rr@_6FoDu8X;J(L4RXrWvtg{6x8b8tBSlR^|maFkm}2tW2Y;HMp+syoY5O zf`LbX3Ec=1s%l0tl%iQWNPQt?D*qk4m2&t{ZrspkDm(tM3$U&+yF*Kc#P*W{Z{a+> zF;tFck52uIO}p33o1W}}g+t4rys)Xcd;H1%NUZ12rP@iRkGn(l^{Qo#t{uxI4ijqVB=;=iU7 z&170ggW?t!@>-t~-iv%a^vd4};D+G2fdP-?E5iAg;hLarXR1hew;RRM`pGlI&wy<`7wjkj%d_wp*ne{mLasPl4oceC~D5t*B6 zoVIfloA!%DWc1~X2qA~-4 zMyWbc&!$M*4-ESof$c)!R^tS=6Q)#*cjLn2bBTi@qXUDYEyeM0&q~O~(5XfF5<7f8 z-ZinNA%syVC_85}6O^SM;$w&G`rFc}#kgubcyxA(GdwPB0R;oZvD&v%RQ2GfIMwY? zgveo}rfj^%+BbHz(*U(LB>kJ4!jcDtTp0pTR>VjIp<$}Sv>vfP=sIIPLR#T0xu5EP zV$y73M-;zW8$Zc;KYzPt!!(hm8v%-gn$eOJJGcL3EVWSm=Dhe%-6$k?tg+RG(K!p9 zST)fr`d1PHPMB>VYXXV0hRSnu{ zwC*dVqIi+P+;$8n=GoVY)z2l}pf(`w2&mEUvQ*SU#E?3eJM69=P+*UNxOWg?zd)e* zoRe$f1U5$PZ4|v8la>l>oikASRT0`;jQA1D zHpchD2@7x|>n_2CK%n%9WqN{ZVB`8n%q-4JUQI1svX)SC#AL1vPh`vi&uSqe&i!3= z3FnA=iZ&K6(^?rnh~Jg$SHvkhFNQbA^@Oybrs5Ry0#o9{Z8$F`kXd%Mq7c8=3oEq! z`@tUBOO(BA-AKAZhvFz}`Zc(L3hyKlz7W`{H0ZV6LxVlx6)%=VkOb%b{LmRZ7B7Cym>y~eCMX`PdJz6l$~*FMeLS%#e>8LyMKnS#c=aRWf)-1v zlmS6+&)p*t+~94!V+Gv6Av=Ko!L=RCRC5d-cIlkcq*l+uK@frlxr6Xmj3Zcud3UE0 z^aRf&pC_JE>-^S(HD}G>75kFo4Q98E+G|0py-5LWZ@G9~5c!RdfTagJ`@o)*3j`9N z^MKJ_)2vbe;oj}GPQftrorXSZ2Ac}IOE7?%gb@>u&_Fl5UBb$E)$@9-3f9FNifM{SkoRI`VEQ$>BM!}UMBfyHXm_={6p>617qD^yV%}(ej z=Nc$*$_3fs9`0gdTWPAW-MSW_$OEofGZ4^(lwF+=n)d^niVB?R$N>IwRW+HQuI?Gq zpr~VXAo&_GXiFo&!SBnlP8c}$TTyCPK>8;Aw%md4V4^TmoMs6}&bwRSM8qtyJnT7~ zo9x&D$l0?<7FU75DvCZ?tu&d`soq>!mOTxNfLs^9Ah&0brnhqCg6m^A)!wXtqsnqa zhZBJ=mlLnp_;@0r$qCJ*fKC-xC(ywp!(3m#uTbGXzl%3kE2Gw0#WPd`W|K@`M!0+~ z>o2n1%_;-{R(0?W6iqq_vF4Y+Z3Tz&9u=asvC{IV0<#s$rxp%42N`zzgG8;05OZEH z1GKADB!70Ix98A4XXS0vZ-7RwvcL`p^&r!8i3hW5P%vwQ6Y(g7LXB;e zSH`vz*C|nFwV$!vhTQn4zk`4+5>&Y$%j9yYjn!q3W`V?R+j~$~|%H^xi}&DQv>O9I5#eq$I+p?1F-m#9y#t$y^8JP>^DuM(mb8tPyF_FByJA1g-I279x6 zzRPU+=2ukQPVP*a(FoqQY`c-iDzPX(;}NBw-l3H&q}mZNec4bYO{lAX_G zP4CmUd~`^*t~^%2UH-EAMS(6jxRtC$gb60Y5wHapN8bZUC{@k11Tc{{zsROr7uh6} zsDSMDtS$iSXS^8fFoE6l{xVbrz9m0EZS^yEhHvB)->62xI%yTXGr1F@YI-3{k@+o5 zcf+KmJndbkL=KxKno)(eUpC_|Gv?t`>(?A-vApn(zd-O1(9(q9A=a&ai1SF7tfjzHO&%)37_$W2OE}zEO+zdH8{Kyx+IK zA0$Qf9KSr^LjQEU!CbJc&5wK9csjEu%rh=yDw0||DZPl=VN9mbgo%4_hC-)+q~OZ_ zVjHB;g9lJ2UU)BF0XNIC4(1+hLb_>cKr9%cfs5?7qnlsqgxx?Dl?QEBI$6|7bYn+E zKc#}*Hp@Sgzs$Z6?T@TljzcnVb4kIUDMK*?9OKHUnokgD$We+%Y=`1o1S66PUx{aj zz5>c_dN^}drI9H0@wW!0&oa-qRM9;aw+=EcdDTDe9#X4@3R2q&5w=bM=uc>^s?KM8b$HY7vBpvjmYKgPGSn4_h%_iOiFC%$Ch|&Jm5~ zL4-_JKz)$0y}h#2phz3J4C=EF`PUO!fa zA|H2UDKB~&`5XIu=iZ6O4&@t(+gJ9D@3&U-Av`Aq#Y)qSj#+4aRXCY8DJgcny9=ya zBQbt@?(2=1Bbpxq4h1`?-Dv<|U9C#2YtYX1u{QQ2>V6drEZ|-a`Nm(5CEFq;X5tV2 zqj3WX&$U}N5qkLO+wVnCMW{(Z17nwP&*^afYYVZ{wbRQFOMn$}4a9Aldhd3;KE5YX zwf^UQKEu68`+nK>-sCcT$Dc8plUs*CVxCvuO2Ipc0i1oUHbgK(3q>p!(j+Ef(zKR>zh?$p3LCr0L5uZns4H)vMzFo@W93R|4*!rS4vSt83{H0WxL-ji# zaYh&-wBgB)lPTnq==*UCFie+f6@A0r@+Im$*~ph(=#H~3ig5LvT*y@>a-iUXL^!W* zcIkE9{HrI*X1f**KqAS`w`){}_e7%rw$eg^NW4drrDXW4``Wl=19Gobcp!3qaQhA% zASI-4a}9L>iu3e(YGHz+YP)}7XiXl^+#h=UmoeP8Z&pt%ZwT$I3T5a#TxT5P5qGgXq-| z4y&Aa>>ded0$G+jVt$_ z-^BI!g%*C`L~Q*gq<`d^$OPf)88-=m$8Ml$zd1Z6OM5gsd;vGLA)SP+0x5&itq!<_ zX+V}Anu1ea+!ir8#Be`NJ9nGpLvtX1_9@+Md?nCgi_5oFlz(I>CGMb#EZRYmk^|Kd<-mL#z5OU7$D1GCjoQ;U?Wh z;dmdAzkpq###O#VQ$OaG0W;<31m%+kDM0i1pE5Gm>x>Issg9D7Av?_ZX0>?3H#3dyqzj3aHIyAx!xj z72$>CSvwq;!7ZzdO)K{ah0}_gz*io!yI#{4k%3K#l>tu-Fn1%UH>bH}+o1%3=1<@< zW@)+z_i5Tm3angGi+~$^?UsYVtign(qx0f1kMyqq0%d)Y{jR)Ab%;Jq;?CtlRj$*y zXLx%@vD-u}hPHyD*2=HUMepG}-B~>WPzfRZ2#D}SxNJW%i+t4qe^V;;ou5v=PH?$G zXS)n^&p~xh7U#%aE>p;2w$3*$X)RPymh*`79#-Pz(_cwPPrp8wSB7~Vr&f4)d}8JL zHgbo8%m6YX#^W`}AO#P|J2KlFw~TPpYD?wKYM#nOpN$l3k=6VbT-ys5ifj`5u~pim z`FEGV^yV8{u_{bh21AP{5-rnOn#KLTt{yH=?)Wm>J$Y47=Zo*mF zjPE9t!Qm~LT=_*$B;zWOpW#q7kzehlGyPd%qXPUWjz!DlL}&4j*CZNE)k*JXpCGrc zgWNY+#ibU8;xZ^sZ^uYWTD!u|fgKg@ce2U{x`ZqF%?+MsQUVqQH`aLn zf_sF0N~>qndsF}(ceENf8^uA-3^J&Us>>z?!4@nHemI5+jhaYDw|Z?Yi-26~KBj#2 zH)tpb2Hm6%=na-eo>M_?emY)x@ak@d2T&guZ0?w(Q(QtB#qmpM5aytej0!_Wt+sQ+ zY^l%5Odv7rHBbLzW1o>RA_i6I@WT7wJDnMumK<9QD$U_0w4|n)k=7b)1F<#B&j$>2 z{oYs&uLKs@UM))_K;*&C|M)Ny_<#RV9r)dO zVAMi#G16DcbE*MvRjaBbfsPogHIfd8tw2^Qn>Yrt<02p)Aop5y6YgnAw=`AF!{?93 zo!=NTxaa5xWd0O($h8(+C+jU`8_h#thVzib5i|pGiEmUT$i@*(&zVmIjvl(wt|AT;}%4Y=-R6F6`XX*wOSVbuk zuWHOByum{~Myo>-i_OSy+huBt)6_$jXG#7`prq6*R*Qy?kvE`iEUh>xf{HbGJte+olGkO z4;_(0g5KDfjmd4&B!&jdJpRkzNhgstd-BI%AzHm^C~Y_UhnhK``7rj}_{#Xj88>n6 zT_y}JO7WiI(03C2gbj8R53CmKp}{E~W$>t%Urh@S;g(#hQHOv|rM#& zNT!FXWsj3vgwT-zFfICV3+@DT%$JS)dC1->aypX$n$$`_@ArCkFP!oBALE~cbyXZh z18a2>Jbz;Sj@3+N7Z7jY9wGOCIFfQ_fhGmVb?R)Cs2AK=pAHKT0387%gT>hcE~q?< zU;|r_ikb&=)*J5I;mYhg!(t*KHvvjF!N%Uu{Kf6u2rNwBM^9b^D$ly%ynJ@JDNlH% zp!(dO^>cku#ohu{Kx4@;G^b2%kD?{5b?r~X+TEtnM`P#ID=l3UAut$-Qx*}&4I^paZ#_+|&tfz4a zn=C8>vHPrz6c{+8#4JDdj-su~*v^5+XmECLYfb>UIPZm|B}%NOxpBW>XpZ6`V;4JA z+*&A3xajrAEYQI`vpUy>MdnLU4<-D2H2)%E{vXGrf9lgA-oVS*h1pi7>l?NHU~QU) zf1{ulR7HQ|)~?_N(!6v{OZW@9*81i!05lGvXzfUh@XIRN)j(>>XAIZ;;dBd*0qGSa zYbKj~SSNWz{GkzdL!BN`vXH@CPKM?{C!jQQPo9M&!_`rZxy)33T35g>qA(7V>QCY# zKW5IpLvuU>Tb{~A)>KZ90tW8p64h?dY@gkmWA7#SLY~6=@Z0Tmvz@@ALu9!9H_tcA z>dGnp)tp!84t z%ItYWxfzcY(}5FwG_K64>0gMlk3hG$?4y_M3E166#^hGMhTcA-MuQ5GnF(CgPA;8S zT%#l6askjK2g6PZ_)wL_jS(i_M|aF5&^67OQUyIAInpL{W^Lz3<@C-}?99~O1!MRj zRH<*dZl7A8-8qXkP$5DU=r-Z3d2@&7V&$!`gZ!Dv@xF{R46u3#eC0)N) zy8nEWaSGy{SJA6E@YEiuoyN0D|IFEFk(YYDmIZ7wJlXY5bp<@ObQhfes)KB$rw{^{ z+Zr5%UBX7cv8QhI0|H_Zm;58IO?kM|o^a+Eg0dM_yd0fBshMDY^5uo(C@u&*m1rOc zJn_J+_mvO(XOQpse`-7LsHU>E?FYmX1$AH;MFDjXR0Ju4NEfhDl_FB4NSCIBE+i<9 zjtx|L2}LDzK@gB4px9^uLzn6xgpNvyl#ut{2gmPe-+G^Up7qW@vlh#ooO5!{-uJ$* z>-z1y*M52Z+y`3Qp8IT>EYK}D6&sL36Wd|mQ(?|IAkqC&AZh&kkr9{@&JHp^ZP+vy zU&or#=E5c-yR>QK#3S4ye%;i%xUyiqMni}qvI{QO%k_#)x#0;k_eS45(!L;?wa-2` z&|~BEb!2*8Z#-o7$jMb@fyS~>*}rh&WP+qJtHFIVbd&s1#_zYC_K_P}h)m7_)~o3~ z;lytqixa1Q*>5L5njz-Usm5T`e3@do&dX;k*s|0b_OmQ)aZW${W|V8hy#{+cFA6O_8@QdGQ;>(Ry662vDkRjMUUvY`KAF$fimF?h&PvG_)7iV_Og9kSW9CcWvo z&dYWw01e=FS&k4c-d}fl-niGCAIVTFs^Z5=!F22)b=P&okag~?;#5MGj7^BEZIZWgB?t*0pxPc!@lwR#GV7I?(SpSJ|P@tbhCoMTR-6!w)=&#~H7&F{D z6SlzJo&~GLtDie0rnYPk1Pa<-27(F1O*UL4P`{uyU0o)JPAb{k6z+Z)<5sFD%t^UChvu3G)9%m2v8$^6&0SRsKM;>w_4xSPrpS$Bs9usM78z;|C{3}dua>V4ynx$$ zbJ2Gu4&G%2Gtehz7R;%i@?&E9f+Z>gNkoTO048L|GcYcU3TFB41p}mlvt!sqV&}w- z@utawx>wU)$gjk31T~y2yNx&j6OS*lb;}`0nKqb8wkenee4FcKc0>Sp-J9lA;Hx#>}$Yo@t3S z$vwlbH(8FkNO{d0eLyfPfn}H#RzS=l)Ak_GsW`4kMP_Ph^?rv#Xoby(#m?5NeCH+WVo zunP;lnfB&iD`{rioEh-TtoIeAcNm;OWq_j2u9wmoO)c2`qeSpGmWWI z^w`G1MvV>QS>$Jkxt7A$js%|KIpZZq$rup_?7Z+_Un}$A4EmR;Qs6&Dk&#-D2nz7D zK7A;2iI!FhQ0aGvlBzz`EQNg-#aL>G`}{A><255rB^* zMD@36KDcopDidZP9MnyowcWQygm`8;wUJhuL%v8$2u z4g|s|+t6}Y;j%qt>$Lo+!?^`tWor+JQo`r{9WU#ji@!R5Oo(O9w8_CwcG^Lw6#MQV zCu?9MXe0Md7N3~4PB*dt%^h+fz<Fip5%vu#e zO|1z)9{BRT=15_MqQxxZ9wink8@S{d1e4T1;R({OXRCHyW={EsF1Up$UmcD;2P;iI z^fz0}pgJbYf>8#A1HrVt1&L+|EEmUOu$Ux8=hR*3rN%WX4A;psIS^_yZeguyu9X_p zfUy>KaYRPyiPNINNTVvORw{xNdNbxsK`SP-vLI&{O@FT1;I86LiPxL7XDqNqpUU|BamQE0Z>RTJ4Qxc=x9IoxT$z4U_cv6VMNZKHDkR+1z z*kkGlmjIjDTcmhjZ#rBEc6B?0AZtO<|eN5ihJ6Yr9`9((|<+=kprO?-k^_!FKhre zm&XEvF9uM<#={&pE}sFXP<%z-tgcbT0_uqF;d{9`Nn8a*Tin=X9Wz>#pX)yfqZrNw z0Mo3Z69#kmSuNCK##HZ7jO0&Wd1UyB|Kh8kQ2NI^dx%fwd>fb*wSGHUgcQ<&FZsRH zo$zJ^?eTogYaA{y8+H-$XFC4daPyhWp+_Dfm-TXFS~QcwM#!#rx{QwXkNCtNC#gs| zHb2JP1JlV9lye4B9_@#bquFY$IG7;ihmwL(F9N!JVhgy>$!Oj&JQ^a$mC-i`ZMHLm z!>f0b#Tjw5ozA~5ebQTcRM(X3sp>@I^(r04obLu>ioJ8xt9$*yimIKBA}@6Y_;CAk zOv;S|6`CvcV*OGYl4Jx%BRl1FCnk-D_jjW=#p++eqUs^3SoeUL&Hv(sOB!p1H&peJ=6at6epV zGi_i9BGQ})^Syk{Pcu7NOTDa8jzhz8yNy)Sx7P#!KrAJU9&?h(gWk2xS6q$Q(wEpa zhoBWf9S@Cjt*`Xc1>ge3A&n=WQZX&$34Qj+j0ea+Ff9;XlUzqwwEj&Mg@7EKMzHfg z{w9-Mb7lk1K;ofGio&jk0g&>u{2p2^L zvf@&Wltk%R9$r>B?)N5L#q2sXS$I5_C$eXzZ%2>C4o%*Z65zD1DnKziloA|XqwqrK zN|JW^qDi5P3GyZn0xW3AMY$Zp^$~s{VUlUOvQ{O~`qz4gI=Fe=A;&%ni>VBQzhO0n7mB9HA4Zt*SzTkvYn? z6HI>83tboAMrb}j{^@#XXuUTnP1(YY6@aOGbt-Z2-ZSM`8>jmR-9b1k_S<%B<9II!xsF%#7clu0k$J#AH0m;wT*9Q@eT#Cpad)=LdACA3_j~jmy>6RJz}hSe_NaCx@u8JtWyx0)xzN?UL){t zEC{BLyL7jybnj^N?W88El5V+PvWt_^(jpXKJ=0%sAC@l|f!tDp=?VfM8DX(s1nJ9f zqAO08<@>=X5|;=tka|!cyV1CdO+(NktZ+m=c@L7TV<(yyrWL~}4onnjNQp6hndq{v z@-boA86=x`D(4bsb&>=DJSClr4_4!zYP?LiB~kv2;H(8XP}lOMnVhJ*l#EUv(h=oP zcQs;Br*dxGomAT5^ZRX)NGk`B&4!Ae%#fItEMxg?A2-zT2NU0({ev1X z^IYIfLI2;SoHm!pysp_ap@2XLS(eb~%13+?rQ;a^t7bNZmtSqovFt6DT`FYmJP32` zLrqy4g64U)vL9MNgID|TdZo{mG`g;WV>u9s6v4-I^#~MoJ7|y+dvCH12TwBm0#Pxr zO8JFXiTP-1@WZQ2V+Aj&REAoo3M?;DnpJh`{6Q?%siv)Ntd*z$lP3;Yir`bn+wY5* zz2zrpJJeq8>geQNMG6rP1_^t+i|8R zpiHYzT3_JF4w%Mc zIWI?s;jFDA0~u_=oJus9(3@$1G2T=?{#_yRCdJUD@-1JI$D%&(>O4KCa^I_1wH8m- zSNnPRaETyKbOMC=pS>c3fiRP22*RP56ag=ReIzsnGm4>Q8N2uKvj^CW>1=&&)gAUo z-6(@OLbY&|%M8H_1zRrc7V>YY6uC?th_bYRutS<^PNi6&dEl z3+oy8CUCp;nPH+yJy!Tt80KuTjl?u2z*Fh|lZSLb@Z&w^*w4?4L;MAe{X;HB^(yjZy@+&-jo7V(m59$m0PF{dE)&nqNlIn;7(^pB+8cN>8}Sw-?J z;yABwUlXkh@4KWHo`~~aueMQAV2C3&nXlQxU{@oq^t3yjS|?aOtK(i^qyItA;k|(k z1h0C9-aG_^2@u;Spq1c=O+OG58qQ*Cm;a6fp%hi0X6IWaXyx2pRT&D)3Kb1^Ae;@t z_&$+kcPtHeJ}2{WNk(1Hvp;T1Cu+i4A;^RdPmBxVJIccN=7uH3*Hs#7ehzEs3~06I zUaBm_^0wTA7`CF$wX-OR3DA9j%%@v$?eBbKm$8bHBU__qw2Vo@+)Q?o; z%hDp@!Oi4V?5b`Fi2U9~Ry!HGcV*8jXhTMNeWBnh>R z_audq-yq?hpHL@Nj|yT?v%%oI^psOcFyQkGd$6p*!R~#g!xX&xp?pi59g;&3X)Ld9 z>~0^Oh5Q@!FG5v_LBln`hNw>xY`Md}N^(D9LjfSpT?UX&%{E1kbO2;OB~lfkiC0sTVPlhDIJguyzK}FYnkgy)eM-Gnk1* z2H`t!xm+(3JLn6&EFxZ-2$LnlmFfY;31kD+N)Pgba!A2hZn#ZGbku*~mT$%R7>k=Ve2XgukG3Nb{kh1;JDX?L@$s!yOV^)(h(iHX=r zIWpt1a_$xcMB8k28Hetu8-%+I290JF!Dm)mj@n8Fkxl03MPqLXAfzD?-mvh`yNwP4 z>xyL3I$ptAst3PCx0qRBRu{F{P639P+Hni~t@Fp5-M2^_tg*by6EssZ2VeLQzgZ(p zLWk)H*VtqU+V38N`FV(^9urw4`D74OfQAM{DdhJ<9&|Z`g#JZ`!d+<&q|Ug#-Wf!l zXZmd6Hjkl8X4qqh`23hS1HNi;qZ3l^MXWkAI2B-Gc*|AkKj5X+UI5Yg6VKf$KXJ`7 zJ#N(c$^d~RXdm<-Cl$5Ara*enlOfH2je_xVDjDX}9CyN#>XG%NKbD;^&^3^ls(uE` zkT+xzo~50Ono~Jo;go>{@1`Z3WHvT<*SuAkeL|&uT0}H0k_p0z0d4&$2K5Az)~P{* z6hWU$W5;=N-S8y+kKx42G_kskeK>77RCE;H|LM7Yb)H*y3AiQukn&paiiDzJpqHF*d_D>87E(`b4~%0b3N&Vr@622#0a zZr9!xRlX4A2z9UUY>f@dPrH^N?X2(Sv+!{P>CD8app;kqr1PtrmuVunp}qB$h@h)C^IGUiD99-Gru`WnH z!Ibm7_@w~aBj=3fxBPnZn??kB1fvyx%nYb7a1mT(fiaHkFb2X7egRIKaQjIzmCjA2$if z-Z;)%)_tAJb*&7i5;~Xc=T0i`uEcMpG9xe8?cp|h)`PJz2OYnpdl$OO*8pG zTedggaNw$gD>i?#UqQk2uU5?e%w>Y5=^4%yy%t#2^$-QafeD3pD*yIv)I}H?se+>i z>~R5`uONaD4XH${dLUCS`4jfrN&jatYN3Y&3?YU6(0{32^&_4B4C=jF8FT}y3^#Xt zyG)hVPFJhQg6frc4>Irq(nGN?h4=OUR~RN(^PKxR_j?{4O)hZj{z4x8PuB(I4%(~Y z|3{93Dt51?24&El zqt^;;Rc$o6wnug-%FD&2PC^=*aT-`{X{Q7}oB;>h0~S#kFc|)zjs5Ev)at`CnD`;# zF1>cMh7E+W;nBR!-unH~1aK5Oz+ZSYf4~pEAmoq}G%EA2rAy|0SwXgo)(<9{0OM}) z*A>9tw*3!YD9RmFr$JcdVUUyz#@)L1dbicMQE#WUx#E_UoN%uH*68sMw?cdCRyjAZ zzgffnoBIz@?qDM|AO3dw{;x0Ajquv-Ud7cRci@2iK}e!oUFR&JT)3eK=_Z5;`zt@n zpJUPwBE$Dl^Z(#=w}Ca$-y-Aw`=KcWf?xkPL(@MI9?&50-$5y0l?U0Ie+8w$1>M;E z7dU2(qG-J{+@I=`5qLO7BB^I#y~?*f4@)y_Rt!2_45$gP4*XrSwLBz=HeDzOuqk*_laFU#28N=j0c}BsAhZ zBy~?TsRMeC%m40l>~Y>km$W53?;*}RsI_Bt#t>J~W_TL++Sdp$i}Smd#fWTK3;1{G z|Mc&SaYriPc!ocYH)_1n6S}}&k@l&K^Ta4 z@SY=XCEq7lOP&X+<8`!Sq_&5l4+ifccG^^)19hYOggzcqJOqCRkzPuODxLhUpIJ>?c3V0m^A&eF=W~SY)my zID5Z`Hd)M_8=^3tIv2nCZRtVu#TUm&0Y-ka1OMn&AdoNmDM#A2;Aw{gJKcP{p1p-f<5#l$%G z5?aE@Lujeyk}Mx;0?SiTyV7L8)C_(1LMS61)GQt-KJsaN4)IdSw;0e`YCp2H{va$l zw9b4*D?dzr8gCUuWwO4(6|!26AD1(b44TlcQkb7BPYH+?*mfq2@idMT$a9BJDP!Oc znXL(mjzos%K}6(7PCt_sk_E78v=o(p6J~z|F)--AbPDazVUGo;^sbt+Ht5xvvOg#} z>$ACe6lMLUY(nHIb)gV?=N3~&`xzlUqjfwD1!&49(RO73;8v_fnIZ&nuLo0okI4#c zUbc-*np{Mn*)*fg7?!@SK~Kxw~LysdRczqRnh+g>i-xvb5x# zRkok+t%mlb)j(I%+f(BWBAqRP25b?2f|u^{=c`bYL@eMe4Lb}}8BL*jT(`hhw)C*B zv_&)d-KEVli-bj@e{b9uiwc)G*I)LTpRu@Q$kt-BL4o+m*YD1#w@K?B*~$~V8Tmr8}Kw;}W=s=B^ z293?S9|?zrW)ReKxV53hrnTQb7Mc))UJyDSbD2x_?*#{6iJ$xeh=cA^ZUX6Slpr$Doe?Ir9) za{6Vk`757Hwc~}MYHVQR%W616ntV-sd7WR8F>`HSN2;#8J%c&J?|k=eXE&pbKxKC4 z>2TfkSd!g-qbn*%gyFTC@6?KLeu*Y%)s(#MZ@DY_c8nfwpu2}bp9^sSo!qW8o>Vly zWpL4h>NmD!$TRU+GeemEG#i(8&DW%N!QXSWUCeo$A>7%-FpC-Gn(($V#Y1)SKG(Hz z+Eib*lVoV01l^V4VBDz-=ie1_?Gf{FCrv_z|GR%1};(3er7O#G*Km#biVcV2amjdDJfn zK%`KM7G3fbOPwkoimXF<6Hyh5b}`2C*S|pz{GeM{*1OSrbV#PWyEFq%@CO9vg@Jab zj7Sk1BPt8V9oPUa5+&4(koF05Gebp~#;&Vzb^IqQ$X;UL3RkMoOf*jbng}-Pv zX9@#i;oMB;K8bw)Pxd@Q4W5%J&RyxMM-xE!DDqKZ68#$1> z!0NQt#=AGeuC5ziBj0ieziR%s9o;3F*7_gALV_-(PT*`rg?)#>aDXB60ID(a1t0^F zbB9}QM)u%Tc{QL~b0`7w|;2!Hiitj95$@wd=M&4NxrrO%Wmn1-)&G+s^13<*Bbz#-EBV-Cqw$0pN z(Ccct4Bv-HOs#AVw>M`}WQ03SMv5mPss^a1^V>lH-V%VkHeM(KmFH%5-q#>5u;pW~ zb6Ae3^-1;QtEL4pX4yk?V-BijgPr;-p2Vc}HlSavK5|b)F;9f^ZIntfw^`1TRc>eB z?wv8tbp>~A&*duA+5C#ekR!4++mVCF!Kri87*ZP@YCopezr41iKAu*$_dMw+?4ojb zug8M}H!>rX3myz%J+3k{uHViN=(*|^WS3vdur3$GjhPlx-fg?NQdLqOz6KAiHN!Z!0oF~hF+CGYb?Pac-;yEn)gSg^^jxh9-s%eu z;*zJ}Yxls{hP8}DsXYux(G(FnoFd2Wt@ zJP@0FV1|3Ka+U`tCBEiLP)yqT>V%-w)9HqIb5#E1DaG`iyJWXBk3(d5VGsclTjY(X z1Be4Fa8~b8@Tl6z@1}lP{G^k<`xE9(n@19Fua`eAP>?m7;@NI}J!0-~^Suu4GbtJq zZs|2!CNHYaJ_(KRv(VBghFn7Ti%^P1Za>E%?Y+;{PF}mh5&c*?uLT(TwI1_DCg23& zJ0aEtzLGYyIicN}nJqta8Duzr?TFd9`|^o@*h8k}WQuq0B4o*|X^BnG1HavFOVwj+ zA|?l?rL zk;-$oRk66iQ`|`{$tH8O<8}>DUV(>X+ zKx|^+s70_HFeVdMaz+hWLLK7{0R%m-)&LY3c2{A|HOL`88UmKa63#-lc%U3Lkg=%t zYV~_Jer`+VaJq-wGm)n~dw5|BleJ#x)9;a)v zP-xKtvP=OPmJi+ji*fgbH_jtB58~c*QTtmZ)BV9;q@Wd-5s0+Duk9pom@7KRai0-( z0fF=z{A`j{%lU9usA#-L#aunUeY_s!7wae-ePVw$Xww8jfu879)JX(Y_azzJ931X$A&3`WFpKlvPo>f9IgUD0B{xEMzUeR^K_6!^*7#H5Zn=Y z`y$jjzXCxfsxi$cL*I28*hNv+L#k6T4Kl26p{#balX6P&7I;=r7|;G0YVhYj1s%3~ z9M$>tv zo#GO^M>(LiRbtDO|lkP3eGCwNrM$&hqF*{q28SYf#hD3I1_gyBhX**zV7t~JRy%X zE~fDCvan>0<2ewYj`7v+ttvy??}{_tq6~SYh?Ttk{2B#VOK4U$pB)eD+x;+@4JR2O zw6%1!h%Jb3&qrO`Z?n~!yba(quIA<>!`hi#P!%=u02WXF1RRVXpJDl!w360*9$-`T zXj??_`OkcMR}#^TUJPoA7U3}@qgi6S%8DAX;g~PmNFBQ9i6U0;#t-|Xt{+x8g(uOT z?)}&RKK!z#mx;#*8er?iHrUOuG8b**5PAGa1HfeEg^sqsM%L~m4fr_V)nI{~G=M~@ j|E2f%qs?X7w8+}PHFWcLN<|%f2zTO$?%~JkR#*N9Ux-*; literal 0 HcmV?d00001 From 6bf82c933a8efd8b7a230c526009a7dac3aa83e2 Mon Sep 17 00:00:00 2001 From: Ben Ashbaugh Date: Tue, 7 Jan 2025 15:19:15 -0800 Subject: [PATCH 2/2] add SPV_INTEL_2d_block_io HTML specification --- extensions/INTEL/SPV_INTEL_2d_block_io.html | 1420 +++++++++++++++++++ 1 file changed, 1420 insertions(+) create mode 100644 extensions/INTEL/SPV_INTEL_2d_block_io.html diff --git a/extensions/INTEL/SPV_INTEL_2d_block_io.html b/extensions/INTEL/SPV_INTEL_2d_block_io.html new file mode 100644 index 0000000..a9beb0e --- /dev/null +++ b/extensions/INTEL/SPV_INTEL_2d_block_io.html @@ -0,0 +1,1420 @@ + + + + + + + +SPV_INTEL_2d_block_io + + + + + +
+
+

Name Strings

+
+
+

SPV_INTEL_2d_block_io

+
+
+
+
+

Contact

+
+
+

To report problems with this extension, please open a new issue at:

+
+ +
+
+
+

Contributors

+
+
+
    +
  • +

    Ben Ashbaugh, Intel

    +
  • +
  • +

    Pekka Jääskeläinen, Intel

    +
  • +
  • +

    Victor Mustya, Intel

    +
  • +
  • +

    Yury Plyakhin, Intel

    +
  • +
+
+
+
+
+

Notice

+
+
+

Copyright (c) 2025 Intel Corporation. All rights reserved.

+
+
+
+
+

Status

+
+
+
    +
  • +

    Complete

    +
  • +
+
+
+
+
+

Version

+
+ ++++ + + + + + + + + + + +

Last Modified Date

2025-01-07

Revision

1

+
+
+
+

Dependencies

+
+
+

This extension is written against the SPIR-V Specification, +Version 1.6, Revision 4.

+
+
+

This extension requires SPIR-V 1.0.

+
+
+

This extension interacts with the SPV_KHR_untyped_pointers extension, by accepting untyped pointers as pointer operands.

+
+
+

This extension interacts with the SPV_INTEL_cache_controls extension, by supporting cache control decorations on the pointer operands.

+
+
+
+
+

Overview

+
+
+

This extension adds additional subgroup block load and store instructions to read two-dimensional blocks of data from a two-dimensional region of memory, or to write two-dimensional blocks of data to a two dimensional region of memory. +This is an important operation for many machine learning algorithms, which operate on two-dimensional matrix data as part of a matrix multiplication algorithm.

+
+
+

The block sizes that are supported are device-specific. +A companion client API specification will describe the block sizes that are supported for a device.

+
+
+

This extension additionally adds support for two pre-processing operations that may be performed when loading a two-dimensional block of data:

+
+
+
    +
  1. +

    The two-dimensional block may be transposed after loading and before it is written to the instruction’s destination.

    +
  2. +
  3. +

    The two-dimensional block may be transformed after loading and before it is written to the instruction’s destination. +The transform operation converts the two-dimensional block from a row-major layout to a packed layout by combining data elements from multiple block rows into 32-bit values. +This layout is used by some matrix multiplication instructions.

    +
  4. +
+
+
+
+
+

Extension Name

+
+
+

To use this extension within a SPIR-V module, the appropriate OpExtension must +be present in the module:

+
+
+
+
OpExtension "SPV_INTEL_2d_block_io"
+
+
+
+
+
+

Modifications to the SPIR-V Specification, Version 1.6

+
+
+

Capabilities

+
+

Modify Section 3.31, Capability, adding rows to the Capability table:

+
+
+
+ +++++ + + + + + + + + + + + + + + + + + + + + + + + +
CapabilityImplicitly Declares

6228

Subgroup2DBlockIOINTEL

6229

Subgroup2DBlockTransformINTEL

Subgroup2DBlockIOINTEL

6230

Subgroup2DBlockTransposeINTEL

Subgroup2DBlockIOINTEL

+
+
+
+
+

Instructions

+
+

Modify Section 3.42.21, Group Instructions, adding to the end of the list of instructions:

+
+ ++++++++++++++ + + + + + + + + + + + + + + + + + + + + +
+

OpSubgroup2DBlockLoadINTEL

+
+
+

Loads one or more 2D blocks of data from a 2D row-major region of memory. +The 2D blocks of data are loaded collectively, as a subgroup operation.

+
+
+

The Element Size operand specifies the size of one block element, in bytes. +The Block Width, Block Height, and Block Count operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type.

+
+
+

The Block Width specifies the number of elements in each block row. +The Block Height specifies the number of rows in each block. +The Block Count specifies the number of blocks to load. +If Block Count is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block.

+
+
+

Src Base Pointer is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the CrossWorkgroup storage class.

+
+
+

The Memory Width, Memory Height, and Memory Pitch operands specify the 2D region of memory to load from. +These operands must be integer type scalars.

+
+
+

The Memory Width specifies the width of the 2D region of memory, in bytes. +The Memory Height specifies the number of rows in the 2D region of memory. +The Memory Pitch specifies the number of bytes between each row in the 2D region of memory.

+
+
+

The Coordinate operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components.

+
+
+

The first component of Coordinate specifies the number of elements to skip, from the start of a row. +The second component of Coordinate specifies the number of rows to skip, from the base of the 2D region of memory.

+
+
+

Dst Pointer is a pointer to per-invocation storage that will hold the results of the 2D block load. +It must be a pointer to the Function storage class.

+
+
+

Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction.

+
+
+

Behavior is undefined unless Block Width, Block Height, Block Count, Src Base Pointer, Memory Width, Memory Height, Memory Pitch, and Coordinate are dynamically uniform for all invocations within the subgroup.

+
+
+ + + + + +
+ + +
+

Follows the templated function:

+
+
+
+
template <typename T, int ElementSize,
+          int BlockWidth, int BlockHeight, int BlockCount>
+void OpSubgroup2DBlockLoadINTEL(
+    const T* srcBasePointer,
+    int memoryWidth,
+    int memoryHeight,
+    int memoryPitch,
+    int2 coordinate,
+    T* dstPointer);
+
+
+
+

Capability:
+Subgroup2DBlockIOINTEL

11

6231

<id>
+Element Size

<id>
+Block Width

<id>
+Block Height

<id>
+Block Count

<id>
+Src Base Pointer

<id>
+Memory Width

<id>
+Memory Height

<id>
+Memory Pitch

<id>
+Coordinate

<id>
+Dst Pointer

+ ++++++++++++++ + + + + + + + + + + + + + + + + + + + + +
+

OpSubgroup2DBlockLoadTransposeINTEL

+
+
+

Loads and transposes one or more 2D blocks of data from a 2D row-major region of memory. +The 2D blocks of data are loaded collectively, as a subgroup operation.

+
+
+

The Element Size operand specifies the size of one block element, in bytes. +The Block Width, Block Height, and Block Count operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type.

+
+
+

The Block Width specifies the number of elements in each block row, pre-transpose. +The Block Height specifies the number of rows in each block, pre-transpose. +The Block Count specifies the number of blocks to load. +If Block Count is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block.

+
+
+

Src Base Pointer is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the CrossWorkgroup storage class.

+
+
+

The Memory Width, Memory Height, and Memory Pitch operands specify the 2D region of memory to load from. +These operands must be integer type scalars.

+
+
+

The Memory Width specifies the width of the 2D region of memory, in bytes. +The Memory Height specifies the number of rows in the 2D region of memory. +The Memory Pitch specifies the number of bytes between each row in the 2D region of memory.

+
+
+

The Coordinate operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components.

+
+
+

The first component of Coordinate specifies the number of elements to skip, from the start of a row. +The second component of Coordinate specifies the number of rows to skip, from the base of the 2D region of memory.

+
+
+

Dst Pointer is a pointer to per-invocation storage that will hold the results of the transposed 2D block load. +It must be a pointer to the Function storage class.

+
+
+

Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction.

+
+
+

Behavior is undefined unless Block Width, Block Height, Block Count, Src Base Pointer, Memory Width, Memory Height, Memory Pitch, and Coordinate are dynamically uniform for all invocations within the subgroup.

+
+
+ + + + + +
+ + +
+

Follows the templated function:

+
+
+
+
template <typename T, int ElementSize,
+          int BlockWidth, int BlockHeight, int BlockCount>
+void OpSubgroup2DBlockLoadTransposeINTEL(
+    const T* srcBasePointer,
+    int memoryWidth,
+    int memoryHeight,
+    int memoryPitch,
+    int2 coordinate,
+    T* dstPointer);
+
+
+
+

Capability:
+Subgroup2DBlockTransposeINTEL

11

6233

<id>
+Element Size

<id>
+Block Width

<id>
+Block Height

<id>
+Block Count

<id>
+Src Base Pointer

<id>
+Memory Width

<id>
+Memory Height

<id>
+Memory Pitch

<id>
+Coordinate

<id>
+Dst Pointer

+ ++++++++++++++ + + + + + + + + + + + + + + + + + + + + +
+

OpSubgroup2DBlockLoadTransformINTEL

+
+
+

Loads and transforms one or more 2D blocks of data into a packed format from a 2D row-major region of memory. +The transformation combines elements from multiple rows of the 2D region into packed 32-bit values. +The 2D blocks of data are loaded and transformed collectively, as a subgroup operation.

+
+
+

The Element Size operand specifies the size of one block element, in bytes. +The Block Width, Block Height, and Block Count operands specify the total number of elements to load. +These operands must be constant instructions with scalar 32-bit integer type.

+
+
+

The Block Width specifies the number of elements in each block row. +The Block Height specifies the number of rows in each block. +The Block Count specifies the number of blocks to load. +If Block Count is greater than one, the blocks are loaded in row-major order, with the next block beginning immediately after the previous block.

+
+
+

Src Base Pointer is a pointer to the base of the 2D region of memory to load from. +It must be a pointer to the CrossWorkgroup storage class.

+
+
+

The Memory Width, Memory Height, and Memory Pitch operands specify the 2D region of memory to load from. +These operands must be integer type scalars.

+
+
+

The Memory Width specifies the width of the 2D region of memory, in bytes. +The Memory Height specifies the number of rows in the 2D region of memory. +The Memory Pitch specifies the number of bytes between each row in the 2D region of memory.

+
+
+

The Coordinate operand specifies the starting location in the 2D region of memory to load from. +It must be a vector of two integer type components.

+
+
+

The first component of Coordinate specifies the number of elements to skip, from the start of a row. +The second component of Coordinate specifies the number of rows to skip, from the base of the 2D region of memory.

+
+
+

Dst Pointer is a pointer to per-invocation storage that will hold the results of the transformed 2D block load. +It must be a pointer to the Function storage class. +If it is an OpTypePointer pointer, it must point to a scalar 32-bit integer type.

+
+
+

Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction.

+
+
+

Behavior is undefined unless Block Width, Block Height, Block Count, Src Base Pointer, Memory Width, Memory Height, Memory Pitch, and Coordinate are dynamically uniform for all invocations within the subgroup.

+
+
+ + + + + +
+ + +
+

Follows the templated function:

+
+
+
+
template <typename T, int ElementSize,
+          int BlockWidth, int BlockHeight, int BlockCount>
+void OpSubgroup2DBlockLoadTransformINTEL(
+    const T* srcBasePointer,
+    int memoryWidth,
+    int memoryHeight,
+    int memoryPitch,
+    int2 coordinate,
+    uint* dstPointer);
+
+
+
+

Capability:
+Subgroup2DBlockTransformINTEL

11

6232

<id>
+Element Size

<id>
+Block Width

<id>
+Block Height

<id>
+Block Count

<id>
+Src Base Pointer

<id>
+Memory Width

<id>
+Memory Height

<id>
+Memory Pitch

<id>
+Coordinate

<id>
+Dst Pointer

+ +++++++++++++ + + + + + + + + + + + + + + + + + + + +
+

OpSubgroup2DBlockPrefetchINTEL

+
+
+

Prefetches one or more blocks of data from a 2D row-major region of memory into a cache. +Prefetching does not affect the functionality of a module but may change its performance characteristics. +The 2D blocks of data are prefetched collectively, as a subgroup operation.

+
+
+

The Element Size operand specifies the size of one block element, in bytes. +The Block Width, Block Height, and Block Count operands specify the total number of elements to prefetch. +These operands must be constant instructions with scalar 32-bit integer type.

+
+
+

The Block Width specifies the number of elements in each block row. +The Block Height specifies the number of rows in each block. +The Block Count specifies the number of blocks to prefetch. +If Block Count is greater than one, the blocks are prefetched in row-major order, with the next block beginning immediately after the previous block.

+
+
+

Src Base Pointer is a pointer to the base of the 2D region of memory to prefetch from. +It must be a pointer to the CrossWorkgroup storage class.

+
+
+

The Memory Width, Memory Height, and Memory Pitch operands specify the 2D region of memory to prefetch. +These operands must be integer type scalars.

+
+
+

The Memory Width specifies the width of the 2D region of memory, in bytes. +The Memory Height specifies the number of rows in the 2D region of memory. +The Memory Pitch specifies the number of bytes between each row in the 2D region of memory.

+
+
+

The Coordinate operand specifies the starting location in the 2D region of memory to prefetch from. +It must be a vector of two integer type components.

+
+
+

The first component of Coordinate specifies the number of elements to skip, from the start of a row. +The second component of Coordinate specifies the number of rows to skip, from the base of the 2D region of memory.

+
+
+

Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction.

+
+
+

Behavior is undefined unless Block Width, Block Height, Block Count, Src Base Pointer, Memory Width, Memory Height, Memory Pitch, and Coordinate are dynamically uniform for all invocations within the subgroup.

+
+
+ + + + + +
+ + +
+

Follows the templated function:

+
+
+
+
template <typename T, int BlockWidth, int BlockHeight, int BlockPitch>
+void OpSubgroup2DBlockPrefetchINTEL(
+    const T* srcBasePointer,
+    int memoryWidth,
+    int memoryHeight,
+    int memoryPitch,
+    int2 coordinate);
+
+
+
+

Capability:
+Subgroup2DBlockIOINTEL

10

6234

<id>
+Element Size

<id>
+Block Width

<id>
+Block Height

<id>
+Block Count

<id>
+Src Pointer

<id>
+Memory Width

<id>
+Memory Height

<id>
+Memory Pitch

<id>
+Coordinate

+ ++++++++++++++ + + + + + + + + + + + + + + + + + + + + +
+

OpSubgroup2DBlockStoreINTEL

+
+
+

Stores one or more 2D blocks of data to a 2D region of memory. +The 2D blocks of data are stored collectively, as a subgroup operation.

+
+
+

The Element Size operand specifies the size of one block element, in bytes. +The Block Width, Block Height, and Block Count operands specify the total number of elements to store. +These operands must be constant instructions with scalar 32-bit integer type.

+
+
+

The Block Width specifies the number of elements in each block row. +The Block Height specifies the number of rows in each block. +The Block Count specifies the number of blocks to store. +If Block Count is greater than one, the blocks are stored in row-major order, with the next block beginning immediately after the previous block.

+
+
+

Src Pointer is a pointer to per-invocation storage that holds the data to store. +It must be a pointer to the Function storage class.

+
+
+

Dst Base Pointer is a pointer to the base of the 2D region of memory to store to. +It must be a pointer to the CrossWorkgroup storage class.

+
+
+

The Memory Width, Memory Height, and Memory Pitch operands specify the 2D region of memory to store to. +These operands must be integer type scalars.

+
+
+

The Memory Width specifies the width of the 2D region of memory, in bytes. +The Memory Height specifies the number of rows in the 2D region of memory. +The Memory Pitch specifies the number of bytes between each row in the 2D region of memory.

+
+
+

The Coordinate operand specifies the starting location in the 2D region of memory to store to. +It must be a vector of two integer type components.

+
+
+

The first component of Coordinate specifies the number of elements to skip, from the start of a row. +The second component of Coordinate specifies the number of rows to skip, from the base of the 2D region of memory.

+
+
+

Behavior is undefined unless all invocations within the subgroup execute the same dynamic instance of this instruction.

+
+
+

Behavior is undefined unless Block Width, Block Height, Block Count, Src Base Pointer, Memory Width, Memory Height, Memory Pitch, and Coordinate are dynamically uniform for all invocations within the subgroup.

+
+
+ + + + + +
+ + +
+

Follows the templated function:

+
+
+
+
template <typename T, int BlockWidth, int BlockHeight, int BlockPitch>
+void OpSubgroup2DBlockStoreINTEL(
+    const T* srcPointer,
+    T* dstBasePointer,
+    int memoryWidth,
+    int memoryHeight,
+    int memoryPitch,
+    int2 coordinate);
+
+
+
+

Capability:
+Subgroup2DBlockIOINTEL

11

6235

<id>
+Element Size

<id>
+Block Width

<id>
+Block Height

<id>
+Block Count

<id>
+Src Pointer

<id>
+Dst Base Pointer

<id>
+Memory Width

<id>
+Memory Height

<id>
+Memory Pitch

<id>
+Coordinate

+
+
+
+
+

Diagram

+
+
+

The diagram below shows the meaning of the 2D block load and store operands.

+
+
+
+SPV INTEL 2d block io diagram +
+
+
+
+
+

Mapping Block Data to Invocations

+
+
+

This section describes the mapping between the 2D block of data that is loaded or stored and the invocations in the subgroup.

+
+
+

First, the Block Width and Block Height are padded, if necessary. +For OpSubgroup2DBlockLoadINTEL, OpSubgroup2DBlockLoadTransformINTEL, and OpSubgroup2DBlockStoreINTEL, the Block Width is padded to the next power-of-two. +For OpSubgroup2DBlockLoadTransposeINTEL, the Block Height is padded to the next power-of-two. +For OpSubgroup2DBlockLoadTransformINTEL, the Block Height is padded to a multiple of four for 1-byte elements, and a multiple of two for 2-byte elements. +For loads, the value of any padded elements is zero. +For stores, the value of any padded elements is ignored.

+
+
+

For OpSubgroup2DBlockLoadTransformINTEL, the loaded block data is then transformed, by combining elements from multiple rows of a single column of the 2D region and packing them into 32-bit values. +For 2-byte elements, every two rows are combined into a 32-bit value, with the lower-numbered rows in the lower bits and the higher-numbered rows in the higher bits. +For 1-byte elements, every four rows are are combined into a 32-bit value, with the lower-numbered rows in the lower bits and the higher-numbered rows in the higher bits. +This packed layout is sometimes referred to as a VNNI layout.

+
+
+

For OpSubgroup2DBlockLoadTransposeINTEL, the loaded block data is then transposed, by assigning the first column of the 2D block to the first row of the transposed 2D block, and so on.

+
+
+

Next, the rows of the 2D block are assigned to invocations in the subgroup. +Because the padded block width and the subgroup size are both powers of two, there are three scenarios to consider:

+
+
+
    +
  1. +

    If the padded block width is equal to the subgroup size, each invocation is assigned one element of the block row.

    +
  2. +
  3. +

    If the padded block width is less than the subgroup size, multiple rows are assigned to the subgroup. +The first row is assigned to the first set of invocations, then the next row is assigned to the next set of invocations, and so on.

    +
  4. +
  5. +

    If the padded block width is greater than the subgroup size, multiple elements of each block row are assigned to each invocation. +The first set of elements are assigned to the first invocation, then the next set of elements are assigned to the next invocation, and so on.

    +
  6. +
+
+
+

In all cases, the lower numbered columns are assigned to the lower numbered invocations.

+
+
+

Examples

+
+
    +
  1. +

    Loading a two row by four column block of elements (Block Width equals four, Block Height equals two), with a subgroup size of four, using OpSubgroup2DBlockLoadINTEL:

    +
    +
    +
    +

    Block data:

    +
    + ++++++ + + + + + + + + + + + + + + +

    0,0

    0,1

    0,2

    0,3

    1,0

    1,1

    1,2

    1,3

    +
    +

    This is the case where the padded block width is equal to the subgroup size. In this case, each invocation is assigned one element of the block row. Therefore, because there are two rows:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the values 0,0 and 1,0.

      +
    • +
    • +

      Invocation 1 is assigned the values 0,1 and 1,1.

      +
    • +
    • +

      Invocation 2 is assigned the values 0,2 and 1,2.

      +
    • +
    • +

      Invocation 3 is assigned the values 0,3 and 1,3.

      +
    • +
    +
    +
    +
    +
  2. +
  3. +

    Loading a four row by two column block of elements (Block Width equals two, Block Height equals four), with a subgroup size of four, using OpSubgroup2DBlockLoadINTEL:

    +
    +
    +
    +

    Block data:

    +
    + ++++ + + + + + + + + + + + + + + + + + + +

    0,0

    0,1

    1,0

    1,1

    2,0

    2,1

    3,0

    3,1

    +
    +

    This is the case where the padded block width is less than the subgroup size. In this case, the first row is assigned to Invocation 0 and Invocation 1, and the second row is assigned to Invocation 2 and Invocation 3, and so on. Therefore:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the values 0,0 and 2,0.

      +
    • +
    • +

      Invocation 1 is assigned the values 0,1 and 2,1.

      +
    • +
    • +

      Invocation 2 is assigned the values 1,0 and 3,0.

      +
    • +
    • +

      Invocation 3 is assigned the values 1,1 and 3,1.

      +
    • +
    +
    +
    +
    +
  4. +
  5. +

    Loading a two row by eight column block of elements (Block Width equals eight, Block Height equals two), with a subgroup size of four, using OpSubgroup2DBlockLoadINTEL:

    +
    +
    +
    +

    Block data:

    +
    + ++++++++++ + + + + + + + + + + + + + + + + + + + + + + +

    0,0

    0,1

    0,2

    0,3

    0,4

    0,5

    0,6

    0,7

    1,0

    1,1

    1,2

    1,3

    1,4

    1,5

    1,6

    1,7

    +
    +

    This is the case where the padded block width is greater than the subgroup size. In this case, the first set of elements of each block row is assigned to Invocation 0, the next set of elements are assigned to Invocation 1, and so on. Therefore:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the values 0,0, 0,1, 1,0, and 1,1.

      +
    • +
    • +

      Invocation 1 is assigned the values 0,2, 0,3, 1,2, and 1,3.

      +
    • +
    • +

      Invocation 2 is assigned the values 0,4, 0,5, 1,4, and 1,5.

      +
    • +
    • +

      Invocation 3 is assigned the values 0,6, 0,7, 1,6, and 1,7.

      +
    • +
    +
    +
    +
    +
  6. +
  7. +

    Loading a four row by two column block of elements (Block Width equals two, Block Height equals four), with a subgroup size of four, using OpSubgroup2DBlockLoadTransposeINTEL:

    +
    +
    +
    +

    Block data (pre-transpose):

    +
    + ++++ + + + + + + + + + + + + + + + + + + +

    0,0

    0,1

    1,0

    1,1

    2,0

    2,1

    3,0

    3,1

    +
    +

    After transposition, this is the same as the first example, so:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the values 0,0 and 0,1.

      +
    • +
    • +

      Invocation 1 is assigned the values 1,0 and 1,1.

      +
    • +
    • +

      Invocation 2 is assigned the values 2,0 and 2,1.

      +
    • +
    • +

      Invocation 3 is assigned the values 3,0 and 3,1.

      +
    • +
    +
    +
    +
    +
  8. +
  9. +

    Loading a two row by four column block of two-byte elements (Block Width equals four, Block Height equals two), with a subgroup size of four, using OpSubgroup2DBlockLoadTransformINTEL:

    +
    +
    +
    +

    Block data:

    +
    + ++++++ + + + + + + + + + + + + + + +

    0,0

    0,1

    0,2

    0,3

    1,0

    1,1

    1,2

    1,3

    +
    +

    For two-byte elements, the transform operation combines every two rows together to form a 32-bit value. Therefore:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the 32-bit value 1,0 | 0,0.

      +
    • +
    • +

      Invocation 1 is assigned the 32-bit value 1,1 | 0,1.

      +
    • +
    • +

      Invocation 2 is assigned the 32-bit value 1,2 | 0,2.

      +
    • +
    • +

      Invocation 3 is assigned the 32-bit value 1,3 | 0,3.

      +
    • +
    +
    +
    +
    +
  10. +
  11. +

    Loading a four row by four column block of one-byte elements (Block Width equals four, Block Height equals two), with a subgroup size of four, using OpSubgroup2DBlockLoadTransformINTEL:

    +
    +
    +
    +

    Block data:

    +
    + ++++++ + + + + + + + + + + + + + + + + + + + + + + + + + + +

    0,0

    0,1

    0,2

    0,3

    1,0

    1,1

    1,2

    1,3

    2,0

    2,1

    2,2

    2,3

    3,0

    3,1

    3,2

    3,3

    +
    +

    For one-byte elements, the transform operation combines every four rows together to form a 32-bit value. Therefore:

    +
    +
    +
      +
    • +

      Invocation 0 is assigned the 32-bit value 3,0 | 2,0 | 1,0 | 0,0.

      +
    • +
    • +

      Invocation 1 is assigned the 32-bit value 3,1 | 2,1 | 1,1 | 0,1.

      +
    • +
    • +

      Invocation 2 is assigned the 32-bit value 3,2 | 2,2 | 1,2 | 0,2.

      +
    • +
    • +

      Invocation 3 is assigned the 32-bit value 3,3 | 2,3 | 1,3 | 0,3.

      +
    • +
    +
    +
    +
    +
  12. +
+
+
+
+
+
+

Out-of-Bounds Behavior

+
+
+

If some or all of the 2D block is out-of-bounds, where the bounds are defined by the Memory Width and Memory Height, the behavior is as follows:

+
+
+
    +
  • +

    For loads, any out-of-bounds elements are assigned the value zero.

    +
  • +
  • +

    For prefetches and stores, any out-of-bounds elements are ignored.

    +
  • +
+
+
+
+
+

Restrictions

+
+
+

The following restrictions apply to the 2D block load, store and prefetch instructions added by this extension:

+
+
+
    +
  • +

    The Element Size must be 1, 2, 4, or 8 bytes.

    +
  • +
  • +

    The Block Width must be a multiple of four for 1-byte elements, or a multiple of two for 2-byte elements.

    +
  • +
  • +

    Behavior is undefined unless:

    +
    +
      +
    • +

      the first component of Coordinate is a multiple of four for 1-byte elements, or a multiple of two for 2-byte elements.

      +
    • +
    • +

      the per-subgroup source or destination base address is cache-line aligned (64 bytes).

      +
    • +
    • +

      the per-invocation source or destination address is aligned to a multiple of the Element Size.

      +
    • +
    • +

      the Memory Width is greater than or equal to 64 bytes and less than or equal to 224 bytes.

      +
    • +
    • +

      the Memory Height is greater than zero and less than or equal to 224 rows.

      +
    • +
    • +

      the Memory Pitch is greater than or equal to the Memory Width and a multiple of 8 bytes.

      +
    • +
    • +

      the SubgroupMaxSize is a power of two.

      +
    • +
    • +

      the SubgroupSize is equal to the SubgroupMaxSize; in other words, this is a full subgroup.

      +
    • +
    +
    +
  • +
+
+
+
+
+

Issues

+
+
+
    +
  1. +

    How should this functionality work with untyped pointers (AKA opaque pointers)?

    +
    +
    +
    +

    RESOLVED: Added an Element Size operand to explicitly specify the amount of data to load or store vs. inferring the element size from typed pointers. +Note, this extension does not currently includes optional Memory Operands to specify pointer alignment, because the pointer must already be aligned due to hardware restrictions..

    +
    +
    +
    +
  2. +
  3. +

    Can we use a 32-bit integer-type scalar to represent the memory width, height, and pitch, or should we allow for 64-bit integers for very large matrices?

    +
    +
    +
    +

    RESOLVED: We will use 32-bit integer-type scalars to represent the block width, height, and count, but we will allow for 64-bit integers to represent the memory width, height, and pitch, and for the block start coordinates.

    +
    +
    +

    The client API environment specs will restrict all of these operands to 32-bit integers initially, however.

    +
    +
    +
    +
  4. +
  5. +

    Terminology-wise, should we use "width" and "height", or "rows" and "columns"?

    +
    +
    +
    +

    RESOLVED: We will use "width" and "height" to describe both the block dimensions and the memory dimensions.

    +
    +
    +
    +
  6. +
  7. +

    Terminology-wise, how should we describe the coordinate to read?

    +
    +
    +
    +

    RESOLVED: The operand will simply be described as a vector coordinate. +This avoids needing to describe "X" or "Y" or "Row" or "Column" in the operand names. +The first coordinate will be the "X" or "Column" coordinate, and the second coordinate will be the "Y" or "Row" coordinate.

    +
    +
    +
    +
  8. +
  9. +

    Terminology-wise, should we use "load" and "store", or "read" and "write"?

    +
    +
    +
    +

    RESOLVED: We will use "load" and "store" for consistency with the rest of the SPIR-V specification.

    +
    +
    +
    +
  10. +
  11. +

    What should the behavior be if some or all of the 2D block is out-of-bounds?

    +
    +
    +
    +

    RESOLVED: The behavior is well-defined. +Specifically, out-of-bounds reads are assigned the value zero, and out-of-bounds prefetches and stores are ignored.

    +
    +
    +
    +
  12. +
+
+
+
+
+

Revision History

+
+ ++++++ + + + + + + + + + + + + + + + + +
RevDateAuthorChanges

1

2025-01-07

Ben Ashbaugh

Initial revision for publication

+
+
+
+ + \ No newline at end of file