You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I'm having a problem with warming up the model.
Currently, the config.pbtxt provides a warmup section, but I'd like to warmup the model for many different batch sizes, which will make the configpb.txt very large. I use the C API, so I'd like to do it either manually in code - by sending InferRequests after model load, or somehow dynamically providing the model config to LoadModel function (currently, not possible).
Doing it manually in code be like:
auto request = tds::InferRequest::Create(tds::InferOptions(model_name_));
for (auto& inp : model_inputs_) {
request->AddInput(inp, begin, end, tds::DataType::FP32, input_shape, tds::MemoryType::CPU, 0);
}
auto result = server_->Infer(*request);
The problem here, I cannot specify the device (the system has multiple GPUs) or more precisely, the instance of the model to send the InferRequest to. Is that possible using the c++ api? If yes, it will solve my issue.
The other way, could be using the ModelWarmup config, it will do it per model instance which is good. But I cannot specify it in any other way than the config.pbtxt. I'd like to specify it on LoadModel, or in other dynamic way with c code - is that possible with c++ api, without editing the config.pbtxt file?
Describe the solution you'd like
A way to warmup the model, for all instances (that runs on different devices). Either by the warmup feature, or manually specifying the model instance of the InferRequest.
The text was updated successfully, but these errors were encountered:
asaff1
changed the title
Manual warmup / specify warmup using c api
Manual warmup per model instance / specify warmup config dynamically using c api
Dec 16, 2024
Is your feature request related to a problem? Please describe.
I'm having a problem with warming up the model.
Currently, the config.pbtxt provides a
warmup
section, but I'd like to warmup the model for many different batch sizes, which will make the configpb.txt very large. I use the C API, so I'd like to do it either manually in code - by sending InferRequests after model load, or somehow dynamically providing the model config to LoadModel function (currently, not possible).Doing it manually in code be like:
The problem here, I cannot specify the device (the system has multiple GPUs) or more precisely, the instance of the model to send the
InferRequest
to. Is that possible using the c++ api? If yes, it will solve my issue.The other way, could be using the ModelWarmup config, it will do it per model instance which is good. But I cannot specify it in any other way than the
config.pbtxt
. I'd like to specify it onLoadModel
, or in other dynamic way with c code - is that possible with c++ api, without editing the config.pbtxt file?Describe the solution you'd like
A way to warmup the model, for all instances (that runs on different devices). Either by the warmup feature, or manually specifying the model instance of the InferRequest.
The text was updated successfully, but these errors were encountered: