Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different forward transforms in different devices #210

Open
rtobar opened this issue Nov 30, 2017 · 3 comments
Open

Different forward transforms in different devices #210

rtobar opened this issue Nov 30, 2017 · 3 comments

Comments

@rtobar
Copy link

rtobar commented Nov 30, 2017

Hi,

I began using clfft (2.12.2, pre-built version shipped with Ubuntu 17.10) recently to perform 2D convolution of images. However, when testing on my currently available devices my code gives different results for the forward transform depending on the device I choose, and on whether I attach a pre-callback or not. In particular, clfft seems to be using the incorrect input data.

A few more details:

  • I'm using an Intel GPU with Beignet on the one hand, and my Intel CPUs via pocl on the other.
  • I'm transforming 2 images of 200x200 each. To keep things simple I'm using in-place C2C transformations, with a single plan for both (i.e., I issue two forward transformations with two different buffers)
  • I'm explicitly setting plan batch size to 1, scale to 1, in/out strides to 1 and 200 for memory-tight settings, plan precision to single, etc., and checking all return values from those calls.

The operations I'm then doing are as follow:

  • I create a R/W buffer for each image. They potentially live next to each other in device's memory.
  • I write both images to their corresponding buffer using the correct data type.
  • Using the events from the writes above, I enqueue the corresponding forward transformations.
  • As a result, both buffers now contain the forward transformation of the second image. This happens only with the Intel/Beignet device; CPUs/pocl seem are happy.
  • I added a pre-callback to make sure the data stored in device-memory was actually correct before the forward transformation was occurring. Adding the pre-callback made the problem disappear. Even an empty callback that simply returns the input[offset] makes the trick.

I've generated the kernels resulting from the different combinations (intel v/s pocl, and no pre-callback v/s pre-callback), so I get 8 files (there is a Stockham2 and a Stockham3 file for each combination above). All Stockham3 kernels look identical, but the Stockham2 kernels differ a bit between the non-pre-callback and the pre-callback cases (but are identical for the two devices on each case):

clfft.kernel.Stockham3.cl-intel-cb.txt
clfft.kernel.Stockham2.cl-intel-cb.txt
clfft.kernel.Stockham3.cl-pocl-cb.txt
clfft.kernel.Stockham2.cl-pocl-cb.txt
clfft.kernel.Stockham3.cl-pocl-nocb.txt
clfft.kernel.Stockham2.cl-pocl-nocb.txt
clfft.kernel.Stockham3.cl-intel-nocb.txt
clfft.kernel.Stockham2.cl-intel-nocb.txt

Please let me know if you need more information, or if I'm missing something terribly obvious.

@rtobar
Copy link
Author

rtobar commented Feb 9, 2018

ping

@bragadeesh
Copy link
Member

thanks for the detailed report; but this is beyond our (clFFT maintainers) normal scope of support, and clFFT is mostly in maintenance mode with only critical issues looked at. I hope someone in the community with experience in Beignet/Intel GPU could help you.

@rtobar
Copy link
Author

rtobar commented Feb 10, 2018

Hi @bragadeesh,

Thanks a lot for your reply, and for clarifying the actual state of the project. I think I will desist then on using clfft for the moment, until I have more time to investigate this issue on my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants