Feature request: custom autoscaling parameters #227

Varun2101 · 2024-01-09T05:56:20Z

Hi, I'm working on deploying a private language model to production through Replicate. I have requests coming in sporadically so provisioning always-on servers is not feasible for me, but I would like requests to be handled at my max concurrency for increased speed. Currently I face 2-2.5 minutes of cold start for each instance and they terminate after 1 minute each, which can lead to some frustrating delays that are longer than necessary.
Would it be possible to add either of these functionalities?

API to force boot n instances together: reduces the spread of boot time, more control to start the boot process early before requests actually need to be processed
Custom idle time limits: this needs to be at least as long as the boot time. I wouldn't mind having to pay for some extra uptime if it meant I don't have stop-start behaviour in the middle of a chunk being processed.

Currently I'm attempting a workaround for no.1 by burst-pinging the model early with the default input n times, but the short idle time means that there's still a good chance that the instances get terminated before I send any actual requests. Let me know if you have a better solution.
Thanks!

mattt · 2024-01-30T18:22:43Z

Hi, @Varun2101. Thanks for sharing this feedback.

To your second point, you can get more control over the behavior of a model on Replicate by creating a deployment. I don't believe we provide a way to configure the timing for autoscaling a deployment, but that's something we've discussed.

nathan-eagle · 2024-10-02T13:58:42Z

I've built a workaround that simply pings the model (to minimize tokens, I ask it to respond with a single character). This takes .1 seconds of runtime, and I set the ping frequency to 1 minute. After 15 minutes of inactivity the pings stop. While it doesn't solve replicate's initial cold boot issue, this keeps the model warm while a user is active with negligible cost - far cheaper than created a dedicated deployment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: custom autoscaling parameters #227

Feature request: custom autoscaling parameters #227

Varun2101 commented Jan 9, 2024

mattt commented Jan 30, 2024

nathan-eagle commented Oct 2, 2024

Feature request: custom autoscaling parameters #227

Feature request: custom autoscaling parameters #227

Comments

Varun2101 commented Jan 9, 2024

mattt commented Jan 30, 2024

nathan-eagle commented Oct 2, 2024