You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wasn't sure whether to place this under bug or whether it works as intended
I'm currently facing an issue where a model, deployed via Triton ONNX Backend, with up to a hundred inputs has a relatively high nv_inference_compute_input_duration_us, which from my understanding this metric also includes copying tensor data to GPU. Is there a possibility that each input results in a seperate GPU allocator call?
From what I see in ModelInstanceState::SetInputTensorshttps://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc#L2273 inputs are processed sequentially and each input results in a call to CreateTensorWithDataAsOrtValue is it possible that this could result in seperate GPU allocations and copies therefore a long nv_inference_compute_input_duration_us? Or is copying tensor data to GPU happening before a request is passed to the ONNX Backend?
The text was updated successfully, but these errors were encountered:
Hi!
I'm currently facing an issue where a model, deployed via Triton ONNX Backend, with up to a hundred inputs has a relatively high
nv_inference_compute_input_duration_us
, which from my understanding this metric also includes copying tensor data to GPU. Is there a possibility that each input results in a seperate GPU allocator call?From what I see in
ModelInstanceState::SetInputTensors
https://github.com/triton-inference-server/onnxruntime_backend/blob/main/src/onnxruntime.cc#L2273 inputs are processed sequentially and each input results in a call toCreateTensorWithDataAsOrtValue
is it possible that this could result in seperate GPU allocations and copies therefore a longnv_inference_compute_input_duration_us
? Or is copying tensor data to GPU happening before a request is passed to the ONNX Backend?The text was updated successfully, but these errors were encountered: