-
Notifications
You must be signed in to change notification settings - Fork 83
Bitonic sort2.0 #83
base: main
Are you sure you want to change the base?
Bitonic sort2.0 #83
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it looks great: well done!
There are a few things that need to be addressed, but this is some excellent work.
@@ -23,21 +23,23 @@ if (USE_COMPUTECPP) | |||
add_definitions(-DSYCL_PSTL_USE_OLD_ALGO) | |||
set(COMPUTECPP_DEVICE_COMPILER_FLAGS "${COMPUTECPP_DEVICE_COMPILER_FLAGS} -DSYCL_PSTL_USE_OLD_ALGO") | |||
|
|||
include_directories("${COMPUTECPP_INCLUDE_DIRECTORY}") | |||
include_directories("${ComputeCpp_DIR}/include") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably change this to target_include_directories
.
# | ||
# TODO: update CMakeLists.txt to work with updated FindComputeCpp.cmake | ||
# | ||
#add_subdirectory (tests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No commented-out code in the MR please.
if (targetCxxStandard MATCHES 17) | ||
set(device_compiler_cxx_standard "-std=c++1z") | ||
elseif (targetCxxStandard MATCHES 14) | ||
set(device_compiler_cxx_standard "-std=c++14") | ||
elseif (targetCxxStandard MATCHES 11) | ||
set(device_compiler_cxx_standard "-std=c++11") | ||
elseif (targetCxxStandard MATCHES 98) | ||
message(FATAL_ERROR "SYCL implementations cannot be compiled using C++98") | ||
message(FATAL_ERROR "SYCL applications cannot be compiled using C++98") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add a check for C++03 too?
bool sorted = true; | ||
|
||
std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">" | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
endl
isn't necessary here, please replace with "\n".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... "\n"
sounds so old-fashioned, especially overkill compared to '\n'
.
My stylistic C++ mood push me towards std::endl
anyway... :-)
std::cout << "out : "; | ||
for (size_t j = 0; j < v.size(); j++) { | ||
std::cout << (v[j]) << (j == v.size() - 1 ? "" : ", "); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ibid.
for (size = 2; size < item.get_local_range(0); size <<= 1) { | ||
dir = (item.get_local_id(0) / size & 1) * -1; | ||
|
||
for (stride = size; stride > 1; stride >>= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is interesting. Why are we halving the stride each time?
typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf; | ||
typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf; | ||
const unsigned int m_stage; | ||
const int mDir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please choose a name that's more meaningful than "dir" (I don't know what that is).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting direction. Zero means 'ascending', one means 'descending'.
@@ -233,6 +1014,7 @@ void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf, | |||
} // passStage | |||
} // stage | |||
} // bitonic_sort | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide a comment to indicate which #if
you're ending.
#include <experimental/algorithm> | ||
#include <sycl/execution_policy> | ||
|
||
using namespace std::experimental::parallel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is used at all.
#include <vector> | ||
|
||
#include <experimental/algorithm> | ||
#include <sycl/execution_policy> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please place these two above the other three includes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many thanks for the contribution. Seems to me there should be some additional clarifications on the algorithm, specially all the magic values, so it is clearer how it is work. An overall explanation may be useful.
|
||
add_sycl_to_target(${SOURCE_NAME} ${CMAKE_CURRENT_BINARY_DIR} | ||
add_sycl_to_target( TARGET ${SOURCE_NAME} SOURCES |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where are all this CMake changes coming from? did you have to do them or you just updated the cmake from some other project?
|
||
sycl::sycl_execution_policy<> sycl_policy; | ||
|
||
/* This sample tests the updated multi-kernel bitonic sort implementation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is either a sample or a test, cannot be both! :-)
If this is the test, move it to the test directory and create a simpler sample.
Otherwise, create a simpler sample !
sorted = sorted && test<float>(); | ||
sorted = sorted && test<double>(); | ||
|
||
return !sorted; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks like a test to me!
namespace emulated_shuffle_builtins { | ||
|
||
template <typename vec_type> | ||
typename vec_type::element_type get_vector_component(vec_type &x, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need the vector passed as reference here if its not modified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least const vec_type &x
} | ||
} | ||
|
||
static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add some documentation explaining what this method is trying to achieve?
static_assert(std::is_arithmetic<T>::value, | ||
"Bitonic sort implementation only works with arithmetic types"); | ||
static_assert( | ||
U == 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the 4 is in several places, worth promoting it to a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
|
||
relational_op_vec_type_ add1(1, 1, 3, 3); | ||
relational_op_vec_type_ add2(2, 3, 2, 3); | ||
relational_op_vec_type_ add3(4, 5, 6, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need some explanation for the numbers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See book: OpenCL in Action by Matthew Scarpino (Manning Publication)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use { }
void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf, | ||
size_t vectorSize) { | ||
using namespace cl::sycl; | ||
int direction = 0 /*0 = ascending, -1 = descending*/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a configuration parameter, should be an enum class
include/sycl/algorithm/sort.hpp
Outdated
.wait(); | ||
|
||
q.wait_and_throw(); | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this return here? there is code afterwards
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debugging artifact.
include/sycl/algorithm/sort.hpp
Outdated
cgh.parallel_for<kernel_bitonic_sort_merge<T, Alloc>>( | ||
ndrange, bitonic_sort_merge<T, 4>(g, l, stage, direction)); | ||
}) | ||
.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you have to wait immediately after every submit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything in quote format is from @chitalu.
} | ||
} | ||
|
||
static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're setting a range of bits at a given offset.
read_bits32(mask.s0(), 0, k), read_bits32(mask.s1(), 0, k), | ||
read_bits32(mask.s2(), 0, k), read_bits32(mask.s3(), 0, k)); | ||
|
||
for (int i = 0; i < 4; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emulating vector with four components
using namespace cl::sycl; | ||
vec<gentypem_type, gentypem_size> ret; | ||
|
||
const unsigned int k = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the spec for the shuffle built-in, only the k-least significant bits are used to determine the value used to select the component from the vector
x
(ory
),
static_assert(std::is_arithmetic<T>::value, | ||
"Bitonic sort implementation only works with arithmetic types"); | ||
static_assert( | ||
U == 4, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above.
|
||
relational_op_vec_type_ add1(1, 1, 3, 3); | ||
relational_op_vec_type_ add2(2, 3, 2, 3); | ||
relational_op_vec_type_ add3(4, 5, 6, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See book: OpenCL in Action by Matthew Scarpino (Manning Publication)
typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf; | ||
typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf; | ||
const unsigned int m_stage; | ||
const int mDir; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting direction. Zero means 'ascending', one means 'descending'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great to have a new sorting algorithm.
Just a few stylistic comments.
@@ -0,0 +1,152 @@ | |||
/* Copyright (c) 2015 The Khronos Group Inc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious about why 2015 here...
* the device. | ||
* Note that for the moment the sycl variants of the algorithm | ||
* are on the sycl namespace and not in std::experimental. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use SYCL
when you are meaning the standard :-)
bool sorted = true; | ||
|
||
std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">" | ||
<< std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... "\n"
sounds so old-fashioned, especially overkill compared to '\n'
.
My stylistic C++ mood push me towards std::endl
anyway... :-)
for (int i = minInputSizeLog2; i <= maxInputSizeLog2; ++i) { | ||
|
||
std::vector<T> v; | ||
v.resize(1 << i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious.
Why not
std::vector<T> v(1 << i);
?
|
||
std::cout << "in : "; | ||
for (int j = 0; j < v.size(); ++j) { | ||
v[j] = init_num(static_cast<T>((v.size() - 1) - j), v.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be a v.emplace_back(...)
and then we could remove v
pre-allocation...
But not that important for a test anyway...
const typename bitonic_sort_base<T, U>::local_buffer_accessor_t &localBuf, | ||
const unsigned int stage, const int dir) | ||
: m_globalBuf(globalBuf), m_localBuf(localBuf), m_stage(stage), | ||
mDir(dir) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ }
typedef typename bitonic_sort_base<T, U>::data_vec_type data_vec_type_; | ||
typedef | ||
typename bitonic_sort_base<T, U>::mask_op_vec_type mask_op_vec_type_; | ||
typedef typename bitonic_sort_base<T, U>::relational_op_vec_type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using
relational_op_vec_type_ comp, add; | ||
unsigned int global_start, global_offset; | ||
|
||
add = relational_op_vec_type_(4, 5, 6, 7); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
{ }
cgh.parallel_for<kernel_bitonic_sort_phase_stage_n<T, Alloc>>( | ||
ndrange, bitonic_sort_stage_n<T, 4>(g, l, stage, high_stage)); | ||
}) | ||
.wait(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to have a wait()
input1 = shuffle2(input1, input2, (comp).template as<mask_op_vec_type_>()); | ||
input2 = shuffle2(input2, temp, (comp).template as<mask_op_vec_type_>()); | ||
VECTOR_SORT(input1, mDir); | ||
VECTOR_SORT(input2, mDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By browsing some code that seems similar at different level accress different kernels, I wonder whether it could not be possible to have some even more generic code and code refactorizing...
Floyd seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
Summary
This pull request modifies SyclParallelSTL by adding an improved version of the the sort algorithm.
Updates:
Update sorting algorithm implementation
Update cmake files to use FindComputeCpp.cmake from ComputeCpp version 0.9.1
Add sample file which tests the new sorting algorithm
Notes
Since SYCL does not currently have the
shuffle
builtins, an additional namespace is added in "sort.hpp" containing emulated copies. Current these emulated builtin are hard-coded for vectors with 4 components.The Cmake files in the tests directory are yet to be updated