Skip to content
This repository has been archived by the owner on Jan 26, 2024. It is now read-only.

Bitonic sort2.0 #83

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Conversation

chitalu
Copy link

@chitalu chitalu commented Aug 22, 2018

Summary

This pull request modifies SyclParallelSTL by adding an improved version of the the sort algorithm.

Updates:

  • Update sorting algorithm implementation

  • Update cmake files to use FindComputeCpp.cmake from ComputeCpp version 0.9.1

  • Add sample file which tests the new sorting algorithm

Notes
Since SYCL does not currently have the shuffle builtins, an additional namespace is added in "sort.hpp" containing emulated copies. Current these emulated builtin are hard-coded for vectors with 4 components.
The Cmake files in the tests directory are yet to be updated

Copy link
Collaborator

@cjdb cjdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it looks great: well done!

There are a few things that need to be addressed, but this is some excellent work.

@@ -23,21 +23,23 @@ if (USE_COMPUTECPP)
add_definitions(-DSYCL_PSTL_USE_OLD_ALGO)
set(COMPUTECPP_DEVICE_COMPILER_FLAGS "${COMPUTECPP_DEVICE_COMPILER_FLAGS} -DSYCL_PSTL_USE_OLD_ALGO")

include_directories("${COMPUTECPP_INCLUDE_DIRECTORY}")
include_directories("${ComputeCpp_DIR}/include")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably change this to target_include_directories.

#
# TODO: update CMakeLists.txt to work with updated FindComputeCpp.cmake
#
#add_subdirectory (tests)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No commented-out code in the MR please.

if (targetCxxStandard MATCHES 17)
set(device_compiler_cxx_standard "-std=c++1z")
elseif (targetCxxStandard MATCHES 14)
set(device_compiler_cxx_standard "-std=c++14")
elseif (targetCxxStandard MATCHES 11)
set(device_compiler_cxx_standard "-std=c++11")
elseif (targetCxxStandard MATCHES 98)
message(FATAL_ERROR "SYCL implementations cannot be compiled using C++98")
message(FATAL_ERROR "SYCL applications cannot be compiled using C++98")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add a check for C++03 too?

bool sorted = true;

std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">"
<< std::endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endl isn't necessary here, please replace with "\n".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)

std::cout << "out : ";
for (size_t j = 0; j < v.size(); j++) {
std::cout << (v[j]) << (j == v.size() - 1 ? "" : ", ");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ibid.

for (size = 2; size < item.get_local_range(0); size <<= 1) {
dir = (item.get_local_id(0) / size & 1) * -1;

for (stride = size; stride > 1; stride >>= 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is interesting. Why are we halving the stride each time?

typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf;
typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf;
const unsigned int m_stage;
const int mDir;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please choose a name that's more meaningful than "dir" (I don't know what that is).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting direction. Zero means 'ascending', one means 'descending'.

@@ -233,6 +1014,7 @@ void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf,
} // passStage
} // stage
} // bitonic_sort
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide a comment to indicate which #if you're ending.

#include <experimental/algorithm>
#include <sycl/execution_policy>

using namespace std::experimental::parallel;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is used at all.

#include <vector>

#include <experimental/algorithm>
#include <sycl/execution_policy>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please place these two above the other three includes.

Copy link
Contributor

@Ruyk Ruyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks for the contribution. Seems to me there should be some additional clarifications on the algorithm, specially all the magic values, so it is clearer how it is work. An overall explanation may be useful.


add_sycl_to_target(${SOURCE_NAME} ${CMAKE_CURRENT_BINARY_DIR}
add_sycl_to_target( TARGET ${SOURCE_NAME} SOURCES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where are all this CMake changes coming from? did you have to do them or you just updated the cmake from some other project?


sycl::sycl_execution_policy<> sycl_policy;

/* This sample tests the updated multi-kernel bitonic sort implementation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is either a sample or a test, cannot be both! :-)
If this is the test, move it to the test directory and create a simpler sample.
Otherwise, create a simpler sample !

sorted = sorted && test<float>();
sorted = sorted && test<double>();

return !sorted;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks like a test to me!

namespace emulated_shuffle_builtins {

template <typename vec_type>
typename vec_type::element_type get_vector_component(vec_type &x,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need the vector passed as reference here if its not modified?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least const vec_type &x

}
}

static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some documentation explaining what this method is trying to achieve?

static_assert(std::is_arithmetic<T>::value,
"Bitonic sort implementation only works with arithmetic types");
static_assert(
U == 4,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the 4 is in several places, worth promoting it to a constant

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


relational_op_vec_type_ add1(1, 1, 3, 3);
relational_op_vec_type_ add2(2, 3, 2, 3);
relational_op_vec_type_ add3(4, 5, 6, 7);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need some explanation for the numbers

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See book: OpenCL in Action by Matthew Scarpino (Manning Publication)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use { }

void bitonic_sort(cl::sycl::queue q, cl::sycl::buffer<T, 1, Alloc> buf,
size_t vectorSize) {
using namespace cl::sycl;
int direction = 0 /*0 = ascending, -1 = descending*/;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a configuration parameter, should be an enum class

.wait();

q.wait_and_throw();
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this return here? there is code afterwards

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

debugging artifact.

cgh.parallel_for<kernel_bitonic_sort_merge<T, Alloc>>(
ndrange, bitonic_sort_merge<T, 4>(g, l, stage, direction));
})
.wait();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you have to wait immediately after every submit?

Copy link
Collaborator

@cjdb cjdb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything in quote format is from @chitalu.

}
}

static void set_bits32(cl::sycl::cl_uint *const dst, const cl::sycl::cl_uint src,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're setting a range of bits at a given offset.

read_bits32(mask.s0(), 0, k), read_bits32(mask.s1(), 0, k),
read_bits32(mask.s2(), 0, k), read_bits32(mask.s3(), 0, k));

for (int i = 0; i < 4; ++i) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emulating vector with four components

using namespace cl::sycl;
vec<gentypem_type, gentypem_size> ret;

const unsigned int k = 3;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spec for the shuffle built-in, only the k-least significant bits are used to determine the value used to select the component from the vector x (or y),

static_assert(std::is_arithmetic<T>::value,
"Bitonic sort implementation only works with arithmetic types");
static_assert(
U == 4,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.


relational_op_vec_type_ add1(1, 1, 3, 3);
relational_op_vec_type_ add2(2, 3, 2, 3);
relational_op_vec_type_ add3(4, 5, 6, 7);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See book: OpenCL in Action by Matthew Scarpino (Manning Publication)

typename bitonic_sort_base<T, U>::global_buffer_accessor_t m_globalBuf;
typename bitonic_sort_base<T, U>::local_buffer_accessor_t m_localBuf;
const unsigned int m_stage;
const int mDir;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorting direction. Zero means 'ascending', one means 'descending'.

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to have a new sorting algorithm.

Just a few stylistic comments.

@@ -0,0 +1,152 @@
/* Copyright (c) 2015 The Khronos Group Inc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious about why 2015 here...

* the device.
* Note that for the moment the sycl variants of the algorithm
* are on the sycl namespace and not in std::experimental.
*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use SYCL when you are meaning the standard :-)

bool sorted = true;

std::cout << __FUNCTION__ << "<" << typename_as_str<T>::name_ << ">"
<< std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... "\n" sounds so old-fashioned, especially overkill compared to '\n'.
My stylistic C++ mood push me towards std::endl anyway... :-)

for (int i = minInputSizeLog2; i <= maxInputSizeLog2; ++i) {

std::vector<T> v;
v.resize(1 << i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious.
Why not

std::vector<T> v(1 << i);

?


std::cout << "in : ";
for (int j = 0; j < v.size(); ++j) {
v[j] = init_num(static_cast<T>((v.size() - 1) - j), v.size());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be a v.emplace_back(...) and then we could remove v pre-allocation...
But not that important for a test anyway...

const typename bitonic_sort_base<T, U>::local_buffer_accessor_t &localBuf,
const unsigned int stage, const int dir)
: m_globalBuf(globalBuf), m_localBuf(localBuf), m_stage(stage),
mDir(dir) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ }

typedef typename bitonic_sort_base<T, U>::data_vec_type data_vec_type_;
typedef
typename bitonic_sort_base<T, U>::mask_op_vec_type mask_op_vec_type_;
typedef typename bitonic_sort_base<T, U>::relational_op_vec_type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using

relational_op_vec_type_ comp, add;
unsigned int global_start, global_offset;

add = relational_op_vec_type_(4, 5, 6, 7);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ }

cgh.parallel_for<kernel_bitonic_sort_phase_stage_n<T, Alloc>>(
ndrange, bitonic_sort_stage_n<T, 4>(g, l, stage, high_stage));
})
.wait();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious to have a wait()

input1 = shuffle2(input1, input2, (comp).template as<mask_op_vec_type_>());
input2 = shuffle2(input2, temp, (comp).template as<mask_op_vec_type_>());
VECTOR_SORT(input1, mDir);
VECTOR_SORT(input2, mDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By browsing some code that seems similar at different level accress different kernels, I wonder whether it could not be possible to have some even more generic code and code refactorizing...

@CLAassistant
Copy link

CLAassistant commented Sep 2, 2019

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Floyd seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants