Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Code Skeleton #1

Open
crtrott opened this issue Jun 9, 2017 · 1 comment
Open

Code Skeleton #1

crtrott opened this issue Jun 9, 2017 · 1 comment

Comments

@crtrott
Copy link
Collaborator

crtrott commented Jun 9, 2017

// Loop over batch of events and their 24 sectors with teams (Cuda: blocks) 
parallel_for(TeamPolicy<>(num_sector x num_events, team_size, 32), [] (...) {

  // Loop over all Rows with threads in team (threadIdx.y in CUDA speak)
  parallel_for(TeamThreadRange(team,num_rows), [&] (int row) {
    // Loop until Blob ideas are all assigned and don't change
    while(not_done) {
      // First do uneven then even pads, to avoid write conflicts
      for(uneven=0; uneven<=1; uneven++) {
        // Do a vector loop over pads (Cuda: threads in warp)
        parallel_for(ThreadVectorRange(team, num_pads(row)/2), [] (int pad_half) {
          pad = uneven+pad_half*2
          for(sig_pad = 0; sig_pad < num_signals(event,sector,row,pad)) {
            for(sig_pad2 = 0; sig_pad2 < num_signals(event,sector,row,pad+1)) {
              //Mark Neighbors
            }
          }
        });
      }
    }
    // Loop over signals to find unique number of blobs
    parallel_for(ThreadVectorRange(team, num_signals), [&] (int signal) {
      ...
    });
    // Build Signal to Blob Map
    parallel_for(ThreadVectorRange(team, num_signals), [&] (int signal) {
      blob_id(signal) = blob_id(blob_id(signal));
    });
  });
});
// Loop over batch of events with teams (Cuda: blocks) 
parallel_for(TeamPolicy<>(num_events, team_size, 32), [] (...) {

  // Build blob to signal map
  parallel_for(TeamThreadRange(team,num_blobs(event)), [&] (int blob) {});
  parallel_for(TeamThreadRange(team,num_signals(event)), [&] (int signal) {});

  //Compute Properties of blobs
  parallel_for(TeamThreadRange(team,num_blobs(event)), [&] (int blob) {
    for(int iSignal =0; iSignal<num_signals(blob)){}
  });
}
@crtrott
Copy link
Collaborator Author

crtrott commented Jun 9, 2017

init (REGION)                   0.95896    1     0.95896 346.373  67.425
Kokkos::View::initialization    0.16979 5136     0.00003  61.327  11.938
real_work (REGION)              0.16008    1     0.16008  57.819  11.255
blob_preparation  (REGION)      0.06439    1     0.06439  23.259   4.528
sector loop                     0.05419    1     0.05419  19.575   3.810
alloc_blob_data   (REGION)      0.03824  512     0.00007  13.813   2.689
do_peaks                        0.02312    1     0.02312   8.351   1.626
comp_global_offset              0.02048  512     0.00004   7.399   1.440
init_blob_id  (REGION)          0.00927  512     0.00002   3.349   0.652

-------------------------------------------------------------------------
Summary:

Total Execution Time (incl. Kokkos + Non-Kokkos:              1.42227 seconds
Total Time in Kokkos kernels:                                 0.27686 seconds
   -> Time outside Kokkos kernels:                            1.14541 seconds
   -> Percentage in Kokkos kernels:                             19.47 %
Total Calls to Kokkos Kernels:                                   6162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant