You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I began using clfft (2.12.2, pre-built version shipped with Ubuntu 17.10) recently to perform 2D convolution of images. However, when testing on my currently available devices my code gives different results for the forward transform depending on the device I choose, and on whether I attach a pre-callback or not. In particular, clfft seems to be using the incorrect input data.
A few more details:
I'm using an Intel GPU with Beignet on the one hand, and my Intel CPUs via pocl on the other.
I'm transforming 2 images of 200x200 each. To keep things simple I'm using in-place C2C transformations, with a single plan for both (i.e., I issue two forward transformations with two different buffers)
I'm explicitly setting plan batch size to 1, scale to 1, in/out strides to 1 and 200 for memory-tight settings, plan precision to single, etc., and checking all return values from those calls.
The operations I'm then doing are as follow:
I create a R/W buffer for each image. They potentially live next to each other in device's memory.
I write both images to their corresponding buffer using the correct data type.
Using the events from the writes above, I enqueue the corresponding forward transformations.
As a result, both buffers now contain the forward transformation of the second image. This happens only with the Intel/Beignet device; CPUs/pocl seem are happy.
I added a pre-callback to make sure the data stored in device-memory was actually correct before the forward transformation was occurring. Adding the pre-callback made the problem disappear. Even an empty callback that simply returns the input[offset] makes the trick.
I've generated the kernels resulting from the different combinations (intel v/s pocl, and no pre-callback v/s pre-callback), so I get 8 files (there is a Stockham2 and a Stockham3 file for each combination above). All Stockham3 kernels look identical, but the Stockham2 kernels differ a bit between the non-pre-callback and the pre-callback cases (but are identical for the two devices on each case):
thanks for the detailed report; but this is beyond our (clFFT maintainers) normal scope of support, and clFFT is mostly in maintenance mode with only critical issues looked at. I hope someone in the community with experience in Beignet/Intel GPU could help you.
Thanks a lot for your reply, and for clarifying the actual state of the project. I think I will desist then on using clfft for the moment, until I have more time to investigate this issue on my side.
Hi,
I began using clfft (2.12.2, pre-built version shipped with Ubuntu 17.10) recently to perform 2D convolution of images. However, when testing on my currently available devices my code gives different results for the forward transform depending on the device I choose, and on whether I attach a pre-callback or not. In particular, clfft seems to be using the incorrect input data.
A few more details:
The operations I'm then doing are as follow:
input[offset]
makes the trick.I've generated the kernels resulting from the different combinations (intel v/s pocl, and no pre-callback v/s pre-callback), so I get 8 files (there is a Stockham2 and a Stockham3 file for each combination above). All Stockham3 kernels look identical, but the Stockham2 kernels differ a bit between the non-pre-callback and the pre-callback cases (but are identical for the two devices on each case):
clfft.kernel.Stockham3.cl-intel-cb.txt
clfft.kernel.Stockham2.cl-intel-cb.txt
clfft.kernel.Stockham3.cl-pocl-cb.txt
clfft.kernel.Stockham2.cl-pocl-cb.txt
clfft.kernel.Stockham3.cl-pocl-nocb.txt
clfft.kernel.Stockham2.cl-pocl-nocb.txt
clfft.kernel.Stockham3.cl-intel-nocb.txt
clfft.kernel.Stockham2.cl-intel-nocb.txt
Please let me know if you need more information, or if I'm missing something terribly obvious.
The text was updated successfully, but these errors were encountered: