Skip to content

Commit

Permalink
docs: add kafka-java to Kafka source/sink docs (#2347)
Browse files Browse the repository at this point in the history
Signed-off-by: Keran Yang <[email protected]>
  • Loading branch information
KeranYang authored Jan 22, 2025
1 parent bdab759 commit 93bf8f8
Show file tree
Hide file tree
Showing 2 changed files with 40 additions and 3 deletions.
22 changes: 20 additions & 2 deletions docs/user-guide/sinks/kafka.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,34 @@
# Kafka Sink

Two methods are available for integrating Kafka topics into your Numaflow pipeline:
using a user-defined Kafka Sink or opting for the built-in Kafka Sink provided by Numaflow.

## Option 1: User-Defined Kafka Sink

Developed and maintained by the Numaflow contributor community,
the [Kafka Sink](https://github.com/numaproj-contrib/kafka-java) provides a reliable and feature-complete solution for publishing messages to Kafka topics.

Key Features:

* **Customization:** Offers complete control over Kafka Sink configurations to tailor to specific requirements.
* **Kafka Java Client Utilization:** Leverages the Kafka Java client for reliable message publishing to Kafka topics.
* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively.

More details on how to use the Kafka Sink can be found [here](https://github.com/numaproj-contrib/kafka-java?tab=readme-ov-file#write-data-to-kafka).

## Option 2: Built-in Kafka Sink

A `Kafka` sink is used to forward the messages to a Kafka topic. Kafka sink supports configuration overrides.

## Kafka Headers
### Kafka Headers

We will insert `keys` into the Kafka header, but since `keys` is an array, we will add `keys` into the header in the
following format.

* `__keys_len` will have the number of `key` in the header. if `__keys_len` == `0`, means no `keys` are present.
* `__keys_%d` will have the `key`, e.g., `__key_0` will be the first key, and so forth.

## Example
### Example

```yaml
spec:
Expand Down
21 changes: 20 additions & 1 deletion docs/user-guide/sources/kafka.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,25 @@
# Kafka Source

A `Kafka` source is used to ingest the messages from a Kafka topic. Numaflow uses consumer-groups to manage offsets.
Two methods are available for integrating Kafka topics into your Numaflow pipeline:
using a user-defined Kafka Source or opting for the built-in Kafka Source provided by Numaflow.

## Option 1: User-Defined Kafka Source

Developed and maintained by the Numaflow contributor community,
the [Kafka Source](https://github.com/numaproj-contrib/kafka-java) offers a robust and feature-complete solution
for integrating Kafka as a data source into your Numaflow pipeline.

Key Features:

* **Flexibility:** Allows full customization of Kafka Source configurations to suit specific needs.
* **Kafka Java Client Utilization:** Leverages the Kafka Java client for robust message consumption from Kafka topics.
* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively.

More details on how to use the Kafka Source can be found [here](https://github.com/numaproj-contrib/kafka-java/blob/main/README.md#read-data-from-kafka).

## Option 2: Built-in Kafka Source

Numaflow provides a built-in `Kafka` source to ingest messages from a Kafka topic. The source uses consumer-groups to manage offsets.

```yaml
spec:
Expand Down

0 comments on commit 93bf8f8

Please sign in to comment.