From 93bf8f867e48bbbce90255f3f862197c33ded1e0 Mon Sep 17 00:00:00 2001 From: Keran Yang Date: Wed, 22 Jan 2025 16:24:13 -0500 Subject: [PATCH] docs: add kafka-java to Kafka source/sink docs (#2347) Signed-off-by: Keran Yang --- docs/user-guide/sinks/kafka.md | 22 ++++++++++++++++++++-- docs/user-guide/sources/kafka.md | 21 ++++++++++++++++++++- 2 files changed, 40 insertions(+), 3 deletions(-) diff --git a/docs/user-guide/sinks/kafka.md b/docs/user-guide/sinks/kafka.md index 6b610b828..a7be8b595 100644 --- a/docs/user-guide/sinks/kafka.md +++ b/docs/user-guide/sinks/kafka.md @@ -1,8 +1,26 @@ # Kafka Sink +Two methods are available for integrating Kafka topics into your Numaflow pipeline: +using a user-defined Kafka Sink or opting for the built-in Kafka Sink provided by Numaflow. + +## Option 1: User-Defined Kafka Sink + +Developed and maintained by the Numaflow contributor community, +the [Kafka Sink](https://github.com/numaproj-contrib/kafka-java) provides a reliable and feature-complete solution for publishing messages to Kafka topics. + +Key Features: + +* **Customization:** Offers complete control over Kafka Sink configurations to tailor to specific requirements. +* **Kafka Java Client Utilization:** Leverages the Kafka Java client for reliable message publishing to Kafka topics. +* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively. + +More details on how to use the Kafka Sink can be found [here](https://github.com/numaproj-contrib/kafka-java?tab=readme-ov-file#write-data-to-kafka). + +## Option 2: Built-in Kafka Sink + A `Kafka` sink is used to forward the messages to a Kafka topic. Kafka sink supports configuration overrides. -## Kafka Headers +### Kafka Headers We will insert `keys` into the Kafka header, but since `keys` is an array, we will add `keys` into the header in the following format. @@ -10,7 +28,7 @@ following format. * `__keys_len` will have the number of `key` in the header. if `__keys_len` == `0`, means no `keys` are present. * `__keys_%d` will have the `key`, e.g., `__key_0` will be the first key, and so forth. -## Example +### Example ```yaml spec: diff --git a/docs/user-guide/sources/kafka.md b/docs/user-guide/sources/kafka.md index 314d72fde..3f3db417f 100644 --- a/docs/user-guide/sources/kafka.md +++ b/docs/user-guide/sources/kafka.md @@ -1,6 +1,25 @@ # Kafka Source -A `Kafka` source is used to ingest the messages from a Kafka topic. Numaflow uses consumer-groups to manage offsets. +Two methods are available for integrating Kafka topics into your Numaflow pipeline: +using a user-defined Kafka Source or opting for the built-in Kafka Source provided by Numaflow. + +## Option 1: User-Defined Kafka Source + +Developed and maintained by the Numaflow contributor community, +the [Kafka Source](https://github.com/numaproj-contrib/kafka-java) offers a robust and feature-complete solution +for integrating Kafka as a data source into your Numaflow pipeline. + +Key Features: + +* **Flexibility:** Allows full customization of Kafka Source configurations to suit specific needs. +* **Kafka Java Client Utilization:** Leverages the Kafka Java client for robust message consumption from Kafka topics. +* **Schema Management:** Integrates seamlessly with the Confluent Schema Registry to support schema validation and manage schema evolution effectively. + +More details on how to use the Kafka Source can be found [here](https://github.com/numaproj-contrib/kafka-java/blob/main/README.md#read-data-from-kafka). + +## Option 2: Built-in Kafka Source + +Numaflow provides a built-in `Kafka` source to ingest messages from a Kafka topic. The source uses consumer-groups to manage offsets. ```yaml spec: