-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: instrument served apps with Prometheus metrics #31
Conversation
3708ffe
to
07c7e4e
Compare
@squat i'd recommend manually testing it out with a basic python server and trying |
07c7e4e
to
924ca6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but at least 1 test is failing with 503 (could be unrelated, but restart didn't help)
/subscribe |
183312d
to
d490a21
Compare
re-posting a conversation I had with @efiop on diagnosing the mysterious test failures we would get when we tried to import the
thanks @chamini2 for the pointers that lead to this fix! |
# TODO(squat): handle shutdowns gracefully. | ||
# You cannot add signal handlers to any loop if you're not | ||
# on the main thread. | ||
# How can we detect that we are being shut down and stop the | ||
# uvicorn servers gracefully? | ||
# loop = asyncio.get_running_loop() | ||
# loop.add_signal_handler(signal.SIGINT, event.set) | ||
# loop.add_signal_handler(signal.SIGTERM, event.set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you considered running them in separate threads? Also interesting if there are implications to running two servers in the same event loop (basically same thread?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we run our two servers in the same event loop for all of our isolate cloud instances. We could run them in different threads, but that doesn't really fix this issue commented out here at all. We have to fix signal handling in the isolate agent to signal the thread in this fal app using a threading event, then this fal app can decide what to do with that event, e.g. set an asyncio event for async servers somehow call stop
on the uvicorn workers. In short, threading vs async doesn't fix this graceful shutdown at all. I prefer to use async for our multiple servers. Some of our internal fal isolate servers have >3 servers running in the same event loop: grpc, http, metrics, etc. Eventually I'd like to define our async-friendly uvicorn server class here in fal-ai/fal and import it in our internal tooling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, didn't know that we have an additional issue with the agent. Not a blocker for me, we can fix this when we need to.
This commit adds basic Prometheus instrumentation of the HTTP server of all applications. This lets us conveniently monitor how models are performing. Signed-off-by: Lucas Servén Marín <[email protected]>
d490a21
to
c3e03cc
Compare
`gather` doesn't cancel other tasks automatically, e.g. causing metrics server to keep running if user app failed to `setup()`. Caused by #31
`gather` doesn't cancel other tasks automatically, e.g. causing metrics server to keep running if user app failed to `setup()`. Caused by #31
`gather` doesn't cancel other tasks automatically, e.g. causing metrics server to keep running if user app failed to `setup()`. Caused by #31
`gather` doesn't cancel other tasks automatically, e.g. causing metrics server to keep running if user app failed to `setup()`. Caused by #31
`gather` doesn't cancel other tasks automatically, e.g. causing metrics server to keep running if user app failed to `setup()`. Caused by #31
This commit adds basic Prometheus instrumentation of the HTTP server of
all applications served with
serve=True
. This lets us convenientlymonitor how models are performing.
Signed-off-by: Lucas Servén Marín [email protected]