Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer Device Address in HLSL #690

Merged
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
95ca151
Change submodule dep
nipunG314 Mar 25, 2024
f803f11
Add integral bda type that supports atomics
nipunG314 Apr 30, 2024
c2b0409
Add workaround for DXC issue #6576
nipunG314 May 1, 2024
bc1f092
Refactor the counting sort logic into general utility
nipunG314 May 2, 2024
42edf28
Separate the bda types into their own headers
nipunG314 May 3, 2024
397f018
Clean up barriers
nipunG314 May 3, 2024
2c57524
Load consecutive values in consecutive invocations
nipunG314 May 3, 2024
a169f62
Wrap bda::__ptr in bda::PtrAccessor
nipunG314 May 6, 2024
c7a71b6
Move Accessor definition to userspace
nipunG314 May 6, 2024
f00c21c
__ptr always returns __ref<T,alignment,false>
nipunG314 May 7, 2024
6d47af6
Remove obsolete TODO
nipunG314 May 7, 2024
04f2eb4
Don't hardcode SPIRV opcodes
nipunG314 May 7, 2024
9b77858
Don't hardcode SPIRV capabilities and storage class
nipunG314 May 7, 2024
7fb1818
Refactor addrof into free floating pointer_to
nipunG314 May 7, 2024
8ec4e12
Use NBL_REF_ARG for Accessors
nipunG314 May 7, 2024
724f997
Add atomic overloads for DXC workaround
nipunG314 May 8, 2024
5d5a473
Add bitcast overload for DXC workaround
nipunG314 May 8, 2024
ac73aff
Fix missing typename syntax
nipunG314 May 8, 2024
2c5852c
Add conditional_capability_v and is_physical_storage_ptr type traits
nipunG314 May 8, 2024
9bc9936
Revert "Add conditional_capability_v and is_physical_storage_ptr type…
nipunG314 May 9, 2024
0fe8368
Make atomicAdd a free floating operation
nipunG314 May 9, 2024
fbd3911
Check for 64bit integers in atomicAdd
nipunG314 May 9, 2024
79958df
Add the enable_if checking to all atomics
nipunG314 May 9, 2024
923c1c5
Move OpLoad and OpStore instructions to spirv_intrinsics
nipunG314 May 9, 2024
ad506c7
Remove unused bitcast
nipunG314 May 20, 2024
6f7214a
Add enable_if<is_spirv_type_v<>> checks to all atomics
nipunG314 May 20, 2024
fb2d887
Remove accessChain for now
nipunG314 May 20, 2024
147ab16
spriv => spirv
nipunG314 May 20, 2024
790926e
Fix typo
nipunG314 May 20, 2024
3fdbfb2
Correctly namespace bda
nipunG314 May 20, 2024
b366ef0
Add operator + and - to __ptr
nipunG314 May 20, 2024
acf9556
Add alignment_of_v
nipunG314 May 20, 2024
dd9734a
Fix alignment_of_v
nipunG314 May 20, 2024
1c9aaee
Use alignment_of_v in __ref
nipunG314 May 20, 2024
d471a8c
Use enable_if for int64atomic checks
nipunG314 May 20, 2024
52a1d84
Fix missing variable in pointer_to()
nipunG314 May 20, 2024
f1988bc
Remove unnecessary nbl::hlsl:: tags
nipunG314 May 22, 2024
11ebb07
Add enable_if_t
nipunG314 May 22, 2024
9c5d833
Add integral checks on atomic intrinsics
nipunG314 May 22, 2024
48b6c8c
Merge nbl::hlsl::experimental into nbl::hlsl::spirv
nipunG314 May 24, 2024
37dbcb4
Refactor CountingPushData into CountingParameters
nipunG314 May 24, 2024
1531a07
Replace get_ptr with atommicAdd in the accessor
nipunG314 May 25, 2024
551d928
Remove unnecessary barrier
nipunG314 May 25, 2024
5f2bedb
Add variable names for better readability
nipunG314 May 25, 2024
bbaac65
Compute buckets_per_thread using integer division
nipunG314 May 25, 2024
b0c3bcf
Move the globals into userspace code
nipunG314 May 25, 2024
993737b
Use sdata's barrier
nipunG314 May 25, 2024
1c22895
Rename index to baseIndex
nipunG314 May 25, 2024
09723eb
Refactor histogram to use inclusive scans and reuse sdata
nipunG314 May 26, 2024
480d13c
Minimize the number of atomic calls to global scratch buffer
nipunG314 May 26, 2024
1f80270
Separate histogram building into one method
nipunG314 May 26, 2024
6b0df6f
Add robust parameter
nipunG314 May 26, 2024
128478d
Move PtrAccessor into a templated BdaAccessor
nipunG314 May 26, 2024
2ef2980
Replace scratch with histogram
nipunG314 May 27, 2024
839ec27
Pass bda::__ptr<T> into BdaAccessor
nipunG314 May 27, 2024
d445e92
Add atomicSub instruction
nipunG314 May 27, 2024
492456c
Call sdata.get(index) only once
nipunG314 May 28, 2024
5447231
Fetch value only after robustness has been checked
nipunG314 May 28, 2024
9c0a80d
Move bitcast after its overloads
nipunG314 May 28, 2024
a52f3a1
Make the buckets_per_thread calculation simpler
nipunG314 May 28, 2024
4418274
Rename data to params
nipunG314 May 28, 2024
79ec69f
Pass workGroupIndex as a parameter
nipunG314 May 28, 2024
71953a1
Hoist the is_last_wg_invocation check
nipunG314 May 28, 2024
08d005e
Only check uint32_t for Unsigned and Signed for int32_t
nipunG314 May 28, 2024
11cccc5
Add enable_if_t checks to bitFieldExtract functions
nipunG314 May 28, 2024
78d7462
Simplify the loops in build_histogram
nipunG314 May 28, 2024
db707af
Simplify the loops in scatter
nipunG314 May 28, 2024
092ef9f
Clamp endIndex to params.dataElementCount
nipunG314 May 28, 2024
f53f6c4
Update index correctly
nipunG314 May 28, 2024
e1e028b
Simplify the loop in histogram
nipunG314 May 28, 2024
f46073d
Add toroidal_histogram_add
nipunG314 May 28, 2024
f4d7249
Make changes to counting.hlsl
nipunG314 May 31, 2024
ad26699
Fix the histogram loop
nipunG314 May 31, 2024
4648b14
Rework the toroidal access on the scatter shader and add comments
nipunG314 May 31, 2024
309ab59
Guard histogram.atomicAdd
nipunG314 Jun 1, 2024
248a2a8
Update submodule
nipunG314 Jun 1, 2024
5a4ad9e
Compute inclusive_scan outside of conditional
nipunG314 Jun 1, 2024
9179d1e
Update examples_tests
nipunG314 Jun 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
branch = main
[submodule "examples_tests"]
path = examples_tests
url = [email protected]:Devsh-Graphics-Programming/Nabla-Examples-and-Tests.git
url = [email protected]:nipunG314/Nabla-Examples-and-Tests.git
[submodule "3rdparty/dxc/dxc"]
path = 3rdparty/dxc/dxc
url = [email protected]:Devsh-Graphics-Programming/DirectXShaderCompiler.git
Expand Down
37 changes: 37 additions & 0 deletions include/nbl/builtin/hlsl/bda/__ptr.hlsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
// Copyright (C) 2018-2024 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#include "nbl/builtin/hlsl/bda/__ref.hlsl"

#ifndef _NBL_BUILTIN_HLSL_BDA_PTR_INCLUDED_
#define _NBL_BUILTIN_HLSL_BDA_PTR_INCLUDED_

namespace bda
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
{
template<typename T, bool _restrict>
struct __ptr
{
using this_t = __ptr < T, _restrict>;
uint64_t addr;

static this_t create(const uint64_t _addr)
{
this_t retval;
retval.addr = _addr;
return retval;
}
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

template<uint32_t alignment>
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
__ref<T,alignment,_restrict> deref()
{
// TODO: assert(addr&uint64_t(alignment-1)==0);
using retval_t = __ref < T, alignment, _restrict>;
retval_t retval;
retval.__init(impl::bitcast<typename retval_t::spv_ptr_t>(addr));
return retval;
}
};
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
}

#endif
128 changes: 128 additions & 0 deletions include/nbl/builtin/hlsl/bda/__ref.hlsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
// Copyright (C) 2018-2024 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#include "nbl/builtin/hlsl/cpp_compat.hlsl"
#include "nbl/builtin/hlsl/spirv_intrinsics/core.hlsl"

#ifndef _NBL_BUILTIN_HLSL_BDA_REF_INCLUDED_
#define _NBL_BUILTIN_HLSL_BDA_REF_INCLUDED_

namespace spirv
{
template<typename M, typename T, typename StorageClass>
[[vk::ext_instruction(/*spv::OpAccessChain*/65)]]
vk::SpirvOpaqueType </* OpTypePointer*/ 32,StorageClass,M> accessChain(
[[vk::ext_reference]] vk::SpirvOpaqueType </* OpTypePointer*/ 32,
StorageClass,T>base,
[[vk::ext_literal]] uint32_t index0
);
}

namespace bda
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
{
template<typename T>
using __spv_ptr_t = vk::SpirvOpaqueType<
/* OpTypePointer */ 32,
/* PhysicalStorageBuffer */ vk::Literal<vk::integral_constant<uint,5349> >,
T
>;
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

namespace impl
{
// this only exists to workaround DXC issue XYZW TODO https://github.com/microsoft/DirectXShaderCompiler/issues/6576
template<class T>
[[vk::ext_capability(/*PhysicalStorageBufferAddresses */ 5347 )]]
[[vk::ext_instruction(/*spv::OpBitcast*/124)]]
T bitcast(uint64_t);

template<typename T, typename P, uint32_t alignment>
[[vk::ext_capability( /*PhysicalStorageBufferAddresses */5347)]]
[[vk::ext_instruction( /*OpLoad*/61)]]
T load(P pointer, [[vk::ext_literal]] uint32_t __aligned = /*Aligned*/0x00000002, [[vk::ext_literal]] uint32_t __alignment = alignment);

template<typename T, typename P, uint32_t alignment >
[[vk::ext_capability( /*PhysicalStorageBufferAddresses */5347)]]
[[vk::ext_instruction( /*OpStore*/62)]]
void store(P pointer, T obj, [[vk::ext_literal]] uint32_t __aligned = /*Aligned*/0x00000002, [[vk::ext_literal]] uint32_t __alignment = alignment);

// TODO: atomics for different types
template<typename T, typename P> // integers operate on 2s complement so same op for signed and unsigned
[[vk::ext_instruction( /*spv::OpAtomicIAdd*/234)]]
T atomicIAdd(P ptr, uint32_t memoryScope, uint32_t memorySemantics, T value);
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
}

// TODO: maybe make normal and restrict separate distinct types instead of templates
template<typename T, bool _restrict = false>
struct __ptr;
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

template<typename T, uint32_t alignment, bool _restrict>
struct __base_ref
{
// TODO:
// static_assert(alignment>=alignof(T));

using spv_ptr_t = uint64_t;
spv_ptr_t ptr;

__spv_ptr_t<T> __get_spv_ptr()
{
return impl::bitcast < __spv_ptr_t<T> > (ptr);
}

// TODO: Would like to use `spv_ptr_t` or OpAccessChain result instead of `uint64_t`
void __init(const spv_ptr_t _ptr)
{
ptr = _ptr;
}

__ptr<T,_restrict> addrof()
{
__ptr<T,_restrict> retval;
retval.addr = nbl::hlsl::spirv::bitcast<uint64_t>(ptr);
return retval;
}
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

T load()
{
return impl::load < T, __spv_ptr_t<T>, alignment > (__get_spv_ptr());
}

void store(const T val)
{
impl::store < T, __spv_ptr_t<T>, alignment > (__get_spv_ptr(), val);
}
};

template<typename T, uint32_t alignment/*=alignof(T)*/, bool _restrict = false>
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
struct __ref : __base_ref<T,alignment,_restrict>
{
using base_t = __base_ref < T, alignment, _restrict>;
using this_t = __ref < T, alignment, _restrict>;
};

#define REF_INTEGRAL(Type) \
template<uint32_t alignment, bool _restrict> \
struct __ref<Type,alignment,_restrict> : __base_ref<Type,alignment,_restrict> \
{ \
using base_t = __base_ref <Type, alignment, _restrict>; \
using this_t = __ref <Type, alignment, _restrict>; \
\
[[vk::ext_capability(/*PhysicalStorageBufferAddresses */ 5347 )]] \
Type atomicAdd(const Type value) \
{ \
return impl::atomicIAdd <Type> (base_t::__get_spv_ptr(), 1, 0, value); \
} \
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved
};

// TODO: specializations for simple builtin types that have atomics
// We are currently only supporting builtin types that work with atomicIAdd
REF_INTEGRAL(int16_t)
REF_INTEGRAL(uint16_t)
REF_INTEGRAL(int32_t)
REF_INTEGRAL(uint32_t)
REF_INTEGRAL(int64_t)
REF_INTEGRAL(uint64_t)
}

#endif
30 changes: 30 additions & 0 deletions include/nbl/builtin/hlsl/sort/common.hlsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Copyright (C) 2018-2024 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#ifndef _NBL_BUILTIN_HLSL_SORT_COMMON_INCLUDED_
#define _NBL_BUILTIN_HLSL_SORT_COMMON_INCLUDED_

namespace nbl
{
namespace hlsl
{
namespace sort
{

struct CountingPushData
{
uint64_t inputKeyAddress;
uint64_t inputValueAddress;
uint64_t scratchAddress;
uint64_t outputKeyAddress;
uint64_t outputValueAddress;
uint32_t dataElementCount;
uint32_t minimum;
uint32_t elementsPerWT;
};
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

}
}
}
#endif
158 changes: 158 additions & 0 deletions include/nbl/builtin/hlsl/sort/counting.hlsl
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
// Copyright (C) 2018-2024 - DevSH Graphics Programming Sp. z O.O.
// This file is part of the "Nabla Engine".
// For conditions of distribution and use, see copyright notice in nabla.h

#include "nbl/builtin/hlsl/bda/__ptr.hlsl"

#include "nbl/builtin/hlsl/sort/common.hlsl"
#include "nbl/builtin/hlsl/workgroup/arithmetic.hlsl"

#ifndef _NBL_BUILTIN_HLSL_SORT_COUNTING_INCLUDED_
#define _NBL_BUILTIN_HLSL_SORT_COUNTING_INCLUDED_

namespace nbl
{
namespace hlsl
{
namespace sort
{

NBL_CONSTEXPR uint32_t BucketsPerThread = ceil((float) BucketCount / WorkgroupSize);
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

groupshared uint32_t prefixScratch[BucketCount];

struct ScratchProxy
{
uint32_t get(const uint32_t ix)
{
return prefixScratch[ix];
}
void set(const uint32_t ix, const uint32_t value)
{
prefixScratch[ix] = value;
}

void workgroupExecutionAndMemoryBarrier()
{
nbl::hlsl::glsl::barrier();
}
};

static ScratchProxy arithmeticAccessor;

groupshared uint32_t sdata[BucketCount];
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

template<typename KeyAccessor, typename ValueAccessor, typename ScratchAccessor>
struct counting
{
void init(
const CountingPushData data
) {
in_key_addr = data.inputKeyAddress;
out_key_addr = data.outputKeyAddress;
in_value_addr = data.inputValueAddress;
out_value_addr = data.outputValueAddress;
scratch_addr = data.scratchAddress;
data_element_count = data.dataElementCount;
minimum = data.minimum;
elements_per_wt = data.elementsPerWT;
}

void histogram()
{
uint32_t tid = nbl::hlsl::workgroup::SubgroupContiguousIndex();

[unroll]
for (int i = 0; i < BucketsPerThread; i++)
sdata[BucketsPerThread * tid + i] = 0;
uint32_t index = (nbl::hlsl::glsl::gl_WorkGroupID().x * WorkgroupSize) * elements_per_wt;

nbl::hlsl::glsl::barrier();

for (int i = 0; i < elements_per_wt; i++)
{
int j = index + i * WorkgroupSize + tid;
if (j >= data_element_count)
break;
uint32_t value = ValueAccessor(in_value_addr + sizeof(uint32_t) * j).template deref<4>().load();
nbl::hlsl::glsl::atomicAdd(sdata[value - minimum], (uint32_t) 1);
}

nbl::hlsl::glsl::barrier();

uint32_t sum = 0;
uint32_t scan_sum = 0;

for (int i = 0; i < BucketsPerThread; i++)
{
sum = nbl::hlsl::workgroup::exclusive_scan < nbl::hlsl::plus < uint32_t >, WorkgroupSize > ::
template __call <ScratchProxy>
(sdata[WorkgroupSize * i + tid], arithmeticAccessor);
devshgraphicsprogramming marked this conversation as resolved.
Show resolved Hide resolved

arithmeticAccessor.workgroupExecutionAndMemoryBarrier();

ScratchAccessor(scratch_addr + sizeof(uint32_t) * (WorkgroupSize * i + tid)).template deref<4>().atomicAdd(sum);
if ((tid == WorkgroupSize - 1) && i > 0)
ScratchAccessor(scratch_addr + sizeof(uint32_t) * (WorkgroupSize * i)).template deref<4>().atomicAdd(scan_sum);

arithmeticAccessor.workgroupExecutionAndMemoryBarrier();

if ((tid == WorkgroupSize - 1) && i < (BucketsPerThread - 1))
{
scan_sum = sum + sdata[WorkgroupSize * i + tid];
sdata[WorkgroupSize * (i + 1)] += scan_sum;
}
}
}

void scatter()
{
uint32_t tid = nbl::hlsl::workgroup::SubgroupContiguousIndex();

[unroll]
for (int i = 0; i < BucketsPerThread; i++)
sdata[BucketsPerThread * tid + i] = 0;
uint32_t index = (nbl::hlsl::glsl::gl_WorkGroupID().x * WorkgroupSize) * elements_per_wt;

nbl::hlsl::glsl::barrier();

[unroll]
for (int i = 0; i < elements_per_wt; i++)
{
int j = index + i * WorkgroupSize + tid;
if (j >= data_element_count)
break;
uint32_t key = KeyAccessor(in_key_addr + sizeof(uint32_t) * j).template deref<4>().load();
uint32_t value = ValueAccessor(in_value_addr + sizeof(uint32_t) * j).template deref<4>().load();
nbl::hlsl::glsl::atomicAdd(sdata[value - minimum], (uint32_t) 1);
}

nbl::hlsl::glsl::barrier();

[unroll]
for (int i = 0; i < elements_per_wt; i++)
{
int j = index + i * WorkgroupSize + tid;
if (j >= data_element_count)
break;
uint32_t key = KeyAccessor(in_key_addr + sizeof(uint32_t) * j).template deref<4>().load();
uint32_t value = ValueAccessor(in_value_addr + sizeof(uint32_t) * j).template deref<4>().load();
sdata[value - minimum] = ScratchAccessor(scratch_addr + sizeof(uint32_t) * (value - minimum)).template deref<4>().atomicAdd(1);
KeyAccessor(out_key_addr + sizeof(uint32_t) * sdata[value - minimum]).template deref<4>().store(key);
ValueAccessor(out_value_addr + sizeof(uint32_t) * sdata[value - minimum]).template deref<4>().store(value);
}
}

uint64_t in_key_addr, out_key_addr;
uint64_t in_value_addr, out_value_addr;
uint64_t scratch_addr;
uint32_t data_element_count;
uint32_t minimum;
uint32_t elements_per_wt;
};

}
}
}

#endif
8 changes: 6 additions & 2 deletions src/nbl/builtin/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,9 @@ LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/barycentric/extensions.glsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/barycentric/frag.glsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/barycentric/vert.glsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/barycentric/utils.glsl")
#bda
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/bda/__ref.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/bda/__ptr.hlsl")
# bump mapping
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/bump_mapping/fragment.glsl") # TODO: rename to `frag.glsl`
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "glsl/bump_mapping/utils.glsl")
Expand Down Expand Up @@ -286,8 +289,9 @@ LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/shapes/ellipse.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/shapes/line.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/shapes/beziers.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/shapes/util.hlsl")


#sort
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/sort/common.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/sort/counting.hlsl")
#subgroup
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/subgroup/ballot.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/subgroup/basic.hlsl")
Expand Down