Dataflow managed I/O for Apache Kafka

Managed I/O supports reading and writing to Apache Kafka.

Requirements

Requires Apache Beam SDK for Java version 2.58.0 or later.

Configuration

Managed I/O uses the following configuration parameters for Apache Kafka.

Read and write configurationData typeDescription
bootstrap_serversstringRequired. A comma-separated list of Kafka bootstrap servers. Example: localhost:9092.
topicstringRequired. The Kafka topic to read or write.
file_descriptor_pathstringThe path to a protocol buffer file descriptor set. Applies only if data_format is "PROTO".
data_formatstringThe format of the messages. Supported values: "AVRO", "JSON", "PROTO", "RAW". The default value is "RAW", which reads or writes the raw bytes of the message payload.
message_namestringThe name of the protocol buffer message. Required if data_format is "PROTO".
schemastring

The Kafka message schema. The expected schema type depends on the data format:

For read pipelines, this parameter is ignored if confluent_schema_registry_url is set.

Read configurationData typeDescription
auto_offset_reset_config string

Specifies the behavior when there is no initial offset or the current offset no longer exists on the Kafka server. The following values are supported:

  • "earliest": Reset the offset to the earliest offset.
  • "latest": Reset the offset to the latest offset.

The default value is "latest".

confluent_schema_registry_subjectstringThe subject of a Confluent schema registry. Required if confluent_schema_registry_url is specified.
confluent_schema_registry_urlstringThe URL of a Confluent schema registry. If specified, the schema parameter is ignored.
consumer_config_updatesmapSets configuration parameters for the Kafka consumer. For more information, see Consumer configs in the Kafka documentation. You can use this parameter to customize the Kafka consumer.
max_read_time_secondsintThe maximum read time, in seconds. This option produces a bounded PCollection and is mainly intended for testing or other non-production scenarios.
Write configurationData typeDescription
producer_config_updatesmapSets configuration parameters for the Kafka producer. For more information, see Producer configs in the Kafka documentation. You can use this parameter to customize the Kafka producer.

To read Avro or JSON messages, you must specify a message schema. To set a schema directly, use the schema parameter. To provide the schema through a Confluent schema registry, set the confluent_schema_registry_url and confluent_schema_registry_subject parameters.

To read or write Protocol Buffer messages, either specify a message schema or set the file_descriptor_path parameter.

For more information and code examples, see the following topics: