Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow oneof in our protobuf gRPC interfaces? #8588

Open
emilk opened this issue Jan 6, 2025 · 5 comments
Open

Allow oneof in our protobuf gRPC interfaces? #8588

emilk opened this issue Jan 6, 2025 · 5 comments
Labels
dataplatform Rerun Data Platform integration remote-store remote store gRPC API

Comments

@emilk
Copy link
Member

emilk commented Jan 6, 2025

This is a design-decision we need to take on our design of our protobuf/gRPC messages:

  • Should we allow the use of oneof in some places?
  • If yes, what are the acceptable and unacceptable uses of it?
  • If no, what is the suggested alternative?

What is oneof

oneof is a feature for implementing sum-types in probufs (see oneof docs).

Why not use oneof

A common complaint of oneof is that it produces complex serializer/deserializer code in languages that don't support sum-types… which is almost all languages, except Rust.

Another drawback of oneof is that one needs to be careful when adding/removing variants, but I think that goes for most of the alternatives to it too (see below).

What do we want to use oneof for?

We want to use them for messages from our log API to the viewer. These messages are one of:

  • SetStoreInfo
  • ArrowMsg (a chunk)
  • BlueprintCommand

We need the ordering of blueprint commands and chunks to be predictable, so we cannot use separate streams for these.

We will also store these messages as protobufs in our .rrd files.

Alternatives

We have these:

message StoreInfo { … }
message ArrowMsg { … }
message BlueprintCommand { … }

How do we send them over a single stream?

The two streams we care about are:

  • .rrd files
  • gRPC streams

Ideally we pick the same solution for both.

Using oneof

message Message {
    oneof variant {
        StoreInfo store_info = 1;
        ArrowMsg arrow_msg = 2;
        BlueprintCommand blueprint_cmd = 3;
    }
}

PRO: simple for Rust users (us)
CON: difficult for non-Rust users (others)

It could make it harder for others to write parsers of .rrd files in non-Rust languages. Do we even care?

NOTE: this does NOT affect how we store data in our indexed database files (which will likely use Lance or similar).

Optional fields

message Message {
    // Exactly one of these is set; pinky-promise!
    optional StoreInfo store_info = 1;
    optional ArrowMsg arrow_msg = 2;
    optional BlueprintCommand blueprint_cmd = 3;
}

PRO: simple
CON: what happens if more than one field is set?

Roll-your-own

message Message {
    enum Type {
        STORE_IUNFO = 0;
        ARROW_MSG = 1;
        BLUEPRINT_CMD = 2;
    }

    Type type = 1;

    /// Either `StoreInfo` or `ArrowMsg` or `BlueprintCommand`, encoded using protobuf.
    /// See `type` for which is which.
    bytes contents = 2;
}

CON: a bit silly to have a protobuf inside a protobuf

@emilk emilk added dataplatform Rerun Data Platform integration remote-store remote store gRPC API labels Jan 6, 2025
@jleibs
Copy link
Member

jleibs commented Jan 6, 2025

We need the ordering of blueprint commands and chunks to be predictable, so we cannot use separate streams for these.

This assumption seems worth questioning. I believe that interlacing blueprint commands + recording data in a single rerun-stream is a significant contributor to complexity of our protocol.

Why can't these simply exist as two parallel streams at the grpc-layer? Send blueprint commands over a blueprint control interface, and send recording data over a chunk streaming interface.

@jprochazk
Copy link
Member

Why can't these simply exist as two parallel streams at the grpc-layer? Send blueprint commands over a blueprint control interface, and send recording data over a chunk streaming interface.

As far as I understand, that is definitely the long-term plan, but we've decided not to do that yet, and keep the LogMsg "command" enum with the same semantics that we've had up until now.

@emilk
Copy link
Member Author

emilk commented Jan 6, 2025

See also

/// Send after all messages in a blueprint to signal that the blueprint is complete.
///
/// This is so that the viewer can wait with activating the blueprint until it is
/// fully transmitted. Showing a half-transmitted blueprint can cause confusion,
/// and also lead to problems with view heuristics.
BlueprintActivationCommand(BlueprintActivationCommand),

@emilk
Copy link
Member Author

emilk commented Jan 6, 2025

After some discussion we've decided to allow oneof for a temporary gRPC endpoint in order to make the transition to gRPC easier. After that we hope to be able to remove it (together with enum LogMsg), and replace it with separate endpoints (one for log info, one for chunks, one for blueprint messages, etc).

@jprochazk
Copy link
Member

I think some more message types like StoreSource should also be using oneof.

Opened a separate issue to get rid of LogMsg:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataplatform Rerun Data Platform integration remote-store remote store gRPC API
Projects
None yet
Development

No branches or pull requests

3 participants