Skip to content

Commit

Permalink
optimize the Vashishta potentials
Browse files Browse the repository at this point in the history
The analytical and tabulated Vashishta potentials are optimized. 

Speeds (Tesla K40; beta-SiC with 8000 atoms at 300 K and 0 GPa) 
before the optimization:
-- Analytical: 2.9*1.0e6 atom*step/second
-- Tabulated: 4.6*1.0e6 atom*step/second (table length = 20 000; relative force error = 1.0e-6)
after the optimization:
-- Analytical: 3.3*1.0e6 atom*step/second
-- Tabulated: 6.1*1.0e6 atom*step/second (table length = 20 000; relative force error = 1.0e-6)
  • Loading branch information
brucefan1983 authored Mar 4, 2018
1 parent 78bdfbd commit e1818fb
Show file tree
Hide file tree
Showing 5 changed files with 445 additions and 309 deletions.
1 change: 1 addition & 0 deletions src/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,7 @@ typedef struct
real *thermo; // some thermodynamic quantities
real *b; real *bp; // for bond-order potentials
real *fv; real *fv_all; // for SHC calculations
real *f12x, *f12y, *f12z; // partial force for many-body potentials
} GPU_Data;


Expand Down
8 changes: 8 additions & 0 deletions src/finalize.cu
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,14 @@ void finalize(Force_Model *force_model, CPU_Data *cpu_data, GPU_Data *gpu_data)
CHECK(cudaFree(force_model->vas_table.table));
}

// for SW and Tersoff type potentials
if (force_model->type >= 30)
{
CHECK(cudaFree(gpu_data->f12x));
CHECK(cudaFree(gpu_data->f12y));
CHECK(cudaFree(gpu_data->f12z));
}

// Free the major memory allocated on the CPU
MY_FREE(cpu_data->NN);
MY_FREE(cpu_data->NL);
Expand Down
14 changes: 14 additions & 0 deletions src/potential.cu
Original file line number Diff line number Diff line change
Expand Up @@ -837,6 +837,20 @@ void process_potential
CHECK(cudaMalloc((void**)&gpu_data->b, memory));
CHECK(cudaMalloc((void**)&gpu_data->bp, memory));
}

// for SW and Tersoff type potentials
if (force_model->type >= 30)
{
// Assume that there are at most 20 neighbors for the many-body part;
// This should be more than enough
// I do not change 20 to para->neighbor.MN because MN can be very large
// in some cases
int memory = sizeof(real) * para->N * 20;
CHECK(cudaMalloc((void**)&gpu_data->f12x, memory));
CHECK(cudaMalloc((void**)&gpu_data->f12y, memory));
CHECK(cudaMalloc((void**)&gpu_data->f12z, memory));
}

}


Expand Down
Loading

0 comments on commit e1818fb

Please sign in to comment.