You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Concise Description:
Getting a 503 'ServiceUnavailableException' on SageMaker Realtime Inference Endpoint when calling the endpoint with multiple requests. Trying to find a way to increase the Job Queue Size as mentioned in the error.
Current behavior:
Sagemaker 503 Error when sending multiple requests to the model. No worker is available to serve request for model: model. Consider increasing job queue size.
Additional context:
Is there a document I could refer that mentions the way to increase the Job Queue Size?
The text was updated successfully, but these errors were encountered:
Multi Model Server (MMS)
Environment Variable: description
job_queue_size: This parameter is useful to tune when you have a scenario where the type of the inference request payload is large, and due to the size of payload being larger, you may have higher heap memory consumption of the JVM in which this queue is being maintained. Ideally you might want to keep the heap memory requirements of JVM lower and allow Python workers to allot more memory for actual model serving. JVM is only for receiving the HTTP requests, queuing them, and dispatching them to the Python-based workers for inference. If you increase the job_queue_size, you might end up increasing the heap memory consumption of the JVM and ultimately taking away memory from the host that could have been used by Python workers. Therefore, exercise caution when tuning this parameter as well.
Hello Team 👋🏼
Concise Description:
Getting a 503 'ServiceUnavailableException' on SageMaker Realtime Inference Endpoint when calling the endpoint with multiple requests. Trying to find a way to increase the Job Queue Size as mentioned in the error.
DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-gpu-py310-cu118-ubuntu20.04
Current behavior:
Sagemaker 503 Error when sending multiple requests to the model.
No worker is available to serve request for model: model. Consider increasing job queue size.
Additional context:
Is there a document I could refer that mentions the way to increase the Job Queue Size?
The text was updated successfully, but these errors were encountered: