[question] huggingface-pytorch-inference image - increase Job Queue Size #4478

anshgandhi · 2025-01-04T09:01:47Z

Hello Team 👋🏼

Concise Description:
Getting a 503 'ServiceUnavailableException' on SageMaker Realtime Inference Endpoint when calling the endpoint with multiple requests. Trying to find a way to increase the Job Queue Size as mentioned in the error.

DLC image/dockerfile:
763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:2.1.0-transformers4.37.0-gpu-py310-cu118-ubuntu20.04

Current behavior:
Sagemaker 503 Error when sending multiple requests to the model.
No worker is available to serve request for model: model. Consider increasing job queue size.

Additional context:
Is there a document I could refer that mentions the way to increase the Job Queue Size?

The text was updated successfully, but these errors were encountered:

bencrabtree · 2025-01-07T16:58:07Z

Thanks for your comment. Here's the documentation that shows how to increase job_queue_size under the FAQ "Q: What are the common tunable environment variables for SageMaker AI containers?": https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference-troubleshooting.html.

Multi Model Server (MMS)
Environment Variable: description

job_queue_size: This parameter is useful to tune when you have a scenario where the type of the inference request payload is large, and due to the size of payload being larger, you may have higher heap memory consumption of the JVM in which this queue is being maintained. Ideally you might want to keep the heap memory requirements of JVM lower and allow Python workers to allot more memory for actual model serving. JVM is only for receiving the HTTP requests, queuing them, and dispatching them to the Python-based workers for inference. If you increase the job_queue_size, you might end up increasing the heap memory consumption of the JVM and ultimately taking away memory from the host that could have been used by Python workers. Therefore, exercise caution when tuning this parameter as well.

anshgandhi · 2025-01-09T19:00:19Z

thanks @bencrabtree - I wasn't sure if these variables would work for Realtime Inference as well. Will give these a try :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] huggingface-pytorch-inference image - increase Job Queue Size #4478

[question] huggingface-pytorch-inference image - increase Job Queue Size #4478

anshgandhi commented Jan 4, 2025

bencrabtree commented Jan 7, 2025

anshgandhi commented Jan 9, 2025

[question] huggingface-pytorch-inference image - increase Job Queue Size #4478

[question] huggingface-pytorch-inference image - increase Job Queue Size #4478

Comments

anshgandhi commented Jan 4, 2025

bencrabtree commented Jan 7, 2025

anshgandhi commented Jan 9, 2025