Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

nstogner · 2024-03-30T01:41:55Z

Add messaging integration (consume requests and produce responses via a messaging system).

Implemented via gocloud package to allow for future cross-cloud support.

Also refactors configuration to use environment variables exclusively.

Fixes #86

samos123 · 2024-03-30T06:01:05Z

I was thinking to keep it simple initially it might be easier to listen to the topic and do a curl to localhost? That way we can re-use all the existing logic instead of implementing it twice. We can optimize later.

nstogner · 2024-03-30T11:57:04Z

We need a way of determining how many concurrent messages to process and when to stop pulling new messages of off the subscription. The easiest way is to invoke the functions directly b/c thats where we have the info. The alternative would be to always process a given number of requests concurrently and wait for the proxy handler to return. I see that as being harder to debug and also it adds another layer of concurrency settings.

nstogner · 2024-03-30T16:01:18Z

NOTE: Currently the code is expecting a request message that looks like the following which is slightly different from the issue's description.

{
  # Standard OpenAI fields
  "model": "...",
  "prompt": "What is the ...",

  # Lingo-specific subscriber fields
  "path": "/v1/completions",
  "metadata": {
    "optional-key": "optional-val"
  }
}

I am back-n-forth on whether we should nest all of the OpenAI fields under .body or not. To avoid collisions with future OpenAI APIs it might be a good idea. It also might be a good idea so that users of languages with well defined types can use OpenAI native types for the .body field.

I am planning on making the update to nest under .body soon. @samos123, thoughts?

nstogner · 2024-03-30T16:04:02Z

A few changes are still needed:

Add subscriber instantiation to main.go. [Update: DONE]
Add concurrent message handling. [Update: DONE]
Test against a live pubsub topic. [Update: DONE]

samos123 · 2024-03-30T17:10:44Z

I do think it's the best longer term approach so we can have more control over queueing. It's important that we only ack messages that have their response sent back to a pubsub topic. So we would still have to wait. I think there might be a timeout from pubsub by when it needa an ack

nstogner · 2024-03-30T18:01:28Z

Appears to be a race somewhere in the integration tests (only fails sometimes):

    util_test.go:67: 
                Error Trace:    /Users/nick/work/substratusai/lingo.pubsub/tests/integration/util_test.go:71
                                                        /usr/local/Cellar/go/1.22.0/libexec/src/runtime/asm_amd64.s:1695
                Error:          Not equal: 
                                expected: 1
                                actual  : 0
                Messages:       scale-up should have occurred
    util_test.go:67: 
                Error Trace:    /Users/nick/work/substratusai/lingo.pubsub/tests/integration/util_test.go:67
                                                        /Users/nick/work/substratusai/lingo.pubsub/tests/integration/subscriber_test.go:44
                Error:          Condition never satisfied
                Test:           TestSubscriber
                Messages:       waiting for the deployment to be scaled up
--- FAIL: TestSubscriber (8.01s)
FAIL

nstogner · 2024-03-30T18:34:21Z

Most recent commit appears to have fixed the race condition in the integration tests - running a lot of back-to-back tests to make sure now.

nstogner · 2024-04-01T00:12:57Z

Ready for testing on GCP.

A very rudimentary test shows a request and response.

Running the controller:

MESSENGER_URLS='gcppubsub://projects/my-project/subscriptions/lingo-requests-sub|gcppubsub://projects/my-project/topics/lingo-responses' go run ./cmd/lingo/main.go

Sending a request:

$ gcloud pubsub topics publish lingo-requests \
  --message='{"path":"/v1/completions", "metadata":{"a":"b"}, "body": {"model": "mdl-1"}}'
messageIds:
- '10824071783903012'

I get a response:

$ gcloud pubsub subscriptions pull lingo-responses-sub --auto-ack
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────┬───────────────────┬──────────────┬──────────────────────────────────────┬──────────────────┬────────────┐
│                                                    DATA                                                    │     MESSAGE_ID    │ ORDERING_KEY │              ATTRIBUTES              │ DELIVERY_ATTEMPT │ ACK_STATUS │
├────────────────────────────────────────────────────────────────────────────────────────────────────────────┼───────────────────┼──────────────┼──────────────────────────────────────┼──────────────────┼────────────┤
│ {"metadata":{"a":"b"},"status_code":404,"body":{"error":{"message":"backend not found for model: mdl-1"}}} │ 10824059966496759 │              │ request_message_id=10824071783903012 │                  │ SUCCESS    │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────┴───────────────────┴───────────

nstogner · 2024-04-01T00:42:58Z

tests/integration/messenger_test.go

+	resp.Ack()
+
+	require.JSONEq(t, fmt.Sprintf(`
+{


Note: error format - .body should match OpenAI's errors.

samos123 · 2024-04-01T05:15:05Z

pkg/messenger/messager.go

+
+		// Slow down a bit to avoid churning through messages and running
+		// up cloud costs when no meaningful work is being done.
+		if consecutiveErrors := m.getConsecutiveErrors(); consecutiveErrors > 0 {


I think there is also a risk that occasionally there is a short spike of errors that would slow things down. I think

True, we will probably need to tune this over time. I think its a good thing to slow stuff down when errors start building up. Right now the wait time will go back to zero once a single message is processed successfully.

I added this delay to account for a few cases:

Spontaneous failures that might creep up overnight.

Some job sending a million malformed requests into a topic and lingo churning through them racking up GPU and PubSub costs.

Added comment containing these thoughts

samos123 · 2024-04-01T05:49:17Z

cmd/lingo/main.go

+		//
+		// URL Examples:
+		//
+		// Google PubSub:


I would like to see an example in this format
"gcppubsub://projects/my-project/subscriptions/my-subscription|gcppubsub://projects/myproject/topics/mytopic"

samos123 · 2024-04-01T06:38:08Z

Very nice! I think we should get this merged in as an experimental MVP and iterate over it. I fixed the docker build and e2e tests by upgrading our Docker image to golang 2.22.

I've also verified this works in a GCP environment that has mistral and mixtral deployed.

samos123 · 2024-04-01T06:47:45Z

One thing I don't get is how do we limit the maximum amount of concurrent open requests? Seems right now there is no way to set such limit?

nstogner · 2024-04-01T14:18:01Z

pkg/messenger/messager.go

+
+	log.Printf("Entering queue: %s", msg.LoggableID)
+
+	complete := m.Queues.EnqueueAndWait(ctx, backendDeployment, msg.LoggableID)


This should block the entire receive loop.

nstogner · 2024-04-01T14:18:33Z

One thing I don't get is how do we limit the maximum amount of concurrent open requests? Seems right now there is no way to set such limit?

The call to .EnqueueAndWait should do this: https://github.com/substratusai/lingo/pull/88/files#r1546379421

nstogner · 2024-04-01T15:05:28Z

Added Issue to track retry functionality: #89

I am good to merge as-is.

Add initial code for pubsub integration - not tested yet

3682fbf

nstogner force-pushed the pubsub branch from ac39016 to 3682fbf Compare March 30, 2024 01:44

Add tests for subscriber

03d994f

Refactor test utils

74c7cde

Fix integration race

bf0a5f7

nstogner added 6 commits March 31, 2024 10:36

Modify message body to be nested

af0f75a

Add comment

432c7d4

Improve error handling - send most errors back to user

eb19ecd

Add (untested) message processing concurrency

824349e

Add subscriber initialization to main.go

6832bac

Rename subscriber pkg to messenger pkg

c3c3b09

nstogner changed the title ~~WIP: PubSub integration~~ PubSub integration Apr 1, 2024

Refactor error messages for responses

ac84043

nstogner commented Apr 1, 2024

View reviewed changes

Add comment

36efb27

samos123 reviewed Apr 1, 2024

View reviewed changes

upgrade Dockerfile to use go 1.22

1a2124a

samos123 reviewed Apr 1, 2024

View reviewed changes

samos123 approved these changes Apr 1, 2024

View reviewed changes

nstogner commented Apr 1, 2024

View reviewed changes

nstogner changed the title ~~PubSub integration~~ Messaging integration (PubSub, SQS, etc) Apr 1, 2024

nstogner changed the title ~~Messaging integration (PubSub, SQS, etc)~~ Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) Apr 1, 2024

Update comments

27ce025

Add blurb in README about messaging integration

f391234

nstogner force-pushed the pubsub branch from c1c9941 to f391234 Compare April 1, 2024 15:15

nstogner merged commit 66cc783 into main Apr 1, 2024
6 checks passed

nstogner deleted the pubsub branch April 1, 2024 15:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

nstogner commented Mar 30, 2024 •

edited

Loading

samos123 commented Mar 30, 2024

nstogner commented Mar 30, 2024

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner commented Mar 30, 2024 •

edited

Loading

samos123 commented Mar 30, 2024

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner commented Mar 30, 2024

nstogner commented Apr 1, 2024

nstogner Apr 1, 2024

samos123 Apr 1, 2024

nstogner Apr 1, 2024 •

edited

Loading

nstogner Apr 1, 2024

samos123 Apr 1, 2024

nstogner Apr 1, 2024

nstogner Apr 1, 2024

samos123 commented Apr 1, 2024

samos123 commented Apr 1, 2024

nstogner Apr 1, 2024

nstogner commented Apr 1, 2024

nstogner commented Apr 1, 2024


		log.Printf("Entering queue: %s", msg.LoggableID)

		complete := m.Queues.EnqueueAndWait(ctx, backendDeployment, msg.LoggableID)

Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

Messaging integration (GCP PubSub, AWS SQS, Kafka, etc) #88

Conversation

nstogner commented Mar 30, 2024 • edited Loading

samos123 commented Mar 30, 2024

nstogner commented Mar 30, 2024

nstogner commented Mar 30, 2024 • edited Loading

nstogner commented Mar 30, 2024 • edited Loading

samos123 commented Mar 30, 2024

nstogner commented Mar 30, 2024 • edited Loading

nstogner commented Mar 30, 2024

nstogner commented Apr 1, 2024

nstogner Apr 1, 2024

Choose a reason for hiding this comment

samos123 Apr 1, 2024

Choose a reason for hiding this comment

nstogner Apr 1, 2024 • edited Loading

Choose a reason for hiding this comment

nstogner Apr 1, 2024

Choose a reason for hiding this comment

samos123 Apr 1, 2024

Choose a reason for hiding this comment

nstogner Apr 1, 2024

Choose a reason for hiding this comment

nstogner Apr 1, 2024

Choose a reason for hiding this comment

samos123 commented Apr 1, 2024

samos123 commented Apr 1, 2024

nstogner Apr 1, 2024

Choose a reason for hiding this comment

nstogner commented Apr 1, 2024

nstogner commented Apr 1, 2024

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner commented Mar 30, 2024 •

edited

Loading

nstogner Apr 1, 2024 •

edited

Loading