Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

Merged
merged 15 commits into from
Apr 1, 2024
Merged

Conversation

nstogner
Copy link
Contributor

@nstogner nstogner commented Mar 30, 2024

Add messaging integration (consume requests and produce responses via a messaging system).

Implemented via gocloud package to allow for future cross-cloud support.

Also refactors configuration to use environment variables exclusively.

Fixes #86

@samos123
Copy link
Contributor

I was thinking to keep it simple initially it might be easier to listen to the topic and do a curl to localhost? That way we can re-use all the existing logic instead of implementing it twice. We can optimize later.

@nstogner
Copy link
Contributor Author

We need a way of determining how many concurrent messages to process and when to stop pulling new messages of off the subscription. The easiest way is to invoke the functions directly b/c thats where we have the info. The alternative would be to always process a given number of requests concurrently and wait for the proxy handler to return. I see that as being harder to debug and also it adds another layer of concurrency settings.

@nstogner
Copy link
Contributor Author

nstogner commented Mar 30, 2024

NOTE: Currently the code is expecting a request message that looks like the following which is slightly different from the issue's description.

{
  # Standard OpenAI fields
  "model": "...",
  "prompt": "What is the ...",

  # Lingo-specific subscriber fields
  "path": "/v1/completions",
  "metadata": {
    "optional-key": "optional-val"
  }
}

I am back-n-forth on whether we should nest all of the OpenAI fields under .body or not. To avoid collisions with future OpenAI APIs it might be a good idea. It also might be a good idea so that users of languages with well defined types can use OpenAI native types for the .body field.

I am planning on making the update to nest under .body soon. @samos123, thoughts?

@nstogner
Copy link
Contributor Author

nstogner commented Mar 30, 2024

A few changes are still needed:

  • Add subscriber instantiation to main.go. [Update: DONE]
  • Add concurrent message handling. [Update: DONE]
  • Test against a live pubsub topic. [Update: DONE]

@samos123
Copy link
Contributor

I do think it's the best longer term approach so we can have more control over queueing. It's important that we only ack messages that have their response sent back to a pubsub topic. So we would still have to wait. I think there might be a timeout from pubsub by when it needa an ack

@nstogner
Copy link
Contributor Author

nstogner commented Mar 30, 2024

Appears to be a race somewhere in the integration tests (only fails sometimes):

    util_test.go:67: 
                Error Trace:    /Users/nick/work/substratusai/lingo.pubsub/tests/integration/util_test.go:71
                                                        /usr/local/Cellar/go/1.22.0/libexec/src/runtime/asm_amd64.s:1695
                Error:          Not equal: 
                                expected: 1
                                actual  : 0
                Messages:       scale-up should have occurred
    util_test.go:67: 
                Error Trace:    /Users/nick/work/substratusai/lingo.pubsub/tests/integration/util_test.go:67
                                                        /Users/nick/work/substratusai/lingo.pubsub/tests/integration/subscriber_test.go:44
                Error:          Condition never satisfied
                Test:           TestSubscriber
                Messages:       waiting for the deployment to be scaled up
--- FAIL: TestSubscriber (8.01s)
FAIL

@nstogner
Copy link
Contributor Author

Most recent commit appears to have fixed the race condition in the integration tests - running a lot of back-to-back tests to make sure now.

@nstogner
Copy link
Contributor Author

nstogner commented Apr 1, 2024

Ready for testing on GCP.

A very rudimentary test shows a request and response.

Running the controller:

MESSENGER_URLS='gcppubsub://projects/my-project/subscriptions/lingo-requests-sub|gcppubsub://projects/my-project/topics/lingo-responses' go run ./cmd/lingo/main.go

Sending a request:

$ gcloud pubsub topics publish lingo-requests \
  --message='{"path":"/v1/completions", "metadata":{"a":"b"}, "body": {"model": "mdl-1"}}'
messageIds:
- '10824071783903012'

I get a response:

$ gcloud pubsub subscriptions pull lingo-responses-sub --auto-ack
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬──────────────┬──────────────────────────────────────┬──────────────────┬────────────┐
│                                                    DATA                                                    │     MESSAGE_ID    │ ORDERING_KEY │              ATTRIBUTES              │ DELIVERY_ATTEMPT │ ACK_STATUS │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼──────────────┼──────────────────────────────────────┼──────────────────┼────────────┤
│ {"metadata":{"a":"b"},"status_code":404,"body":{"error":{"message":"backend not found for model: mdl-1"}}} │ 10824059966496759 │              │ request_message_id=10824071783903012 │                  │ SUCCESS    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────

@nstogner nstogner changed the title WIP: PubSub integration PubSub integration Apr 1, 2024
resp.Ack()

require.JSONEq(t, fmt.Sprintf(`
{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: error format - .body should match OpenAI's errors.


// Slow down a bit to avoid churning through messages and running
// up cloud costs when no meaningful work is being done.
if consecutiveErrors := m.getConsecutiveErrors(); consecutiveErrors > 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is also a risk that occasionally there is a short spike of errors that would slow things down. I think

Copy link
Contributor Author

@nstogner nstogner Apr 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, we will probably need to tune this over time. I think its a good thing to slow stuff down when errors start building up. Right now the wait time will go back to zero once a single message is processed successfully.

I added this delay to account for a few cases:

  • Spontaneous failures that might creep up overnight.
  • Some job sending a million malformed requests into a topic and lingo churning through them racking up GPU and PubSub costs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comment containing these thoughts

//
// URL Examples:
//
// Google PubSub:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to see an example in this format
"gcppubsub://projects/my-project/subscriptions/my-subscription|gcppubsub://projects/myproject/topics/mytopic"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@samos123
Copy link
Contributor

samos123 commented Apr 1, 2024

Very nice! I think we should get this merged in as an experimental MVP and iterate over it. I fixed the docker build and e2e tests by upgrading our Docker image to golang 2.22.

I've also verified this works in a GCP environment that has mistral and mixtral deployed.

@samos123
Copy link
Contributor

samos123 commented Apr 1, 2024

One thing I don't get is how do we limit the maximum amount of concurrent open requests? Seems right now there is no way to set such limit?


log.Printf("Entering queue: %s", msg.LoggableID)

complete := m.Queues.EnqueueAndWait(ctx, backendDeployment, msg.LoggableID)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should block the entire receive loop.

@nstogner
Copy link
Contributor Author

nstogner commented Apr 1, 2024

One thing I don't get is how do we limit the maximum amount of concurrent open requests? Seems right now there is no way to set such limit?

The call to .EnqueueAndWait should do this: https://github.com/substratusai/lingo/pull/88/files#r1546379421

@nstogner nstogner changed the title PubSub integration Messaging integration (PubSub, SQS, etc) Apr 1, 2024
@nstogner nstogner changed the title Messaging integration (PubSub, SQS, etc) Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) Apr 1, 2024
@nstogner
Copy link
Contributor Author

nstogner commented Apr 1, 2024

Added Issue to track retry functionality: #89

I am good to merge as-is.

@nstogner nstogner merged commit 66cc783 into main Apr 1, 2024
6 checks passed
@nstogner nstogner deleted the pubsub branch April 1, 2024 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Batch support through Pub/Sub
2 participants