Skip to main content

Kafka

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.

note

You need to have a running instance of Apache Kafka before you can integrate it with Gigapipe.

Once you have a running Apache Kafka server, follow these instructions:

Step 1 - Connect your Gigapipe account with an instance of Apache Kafka

  • In the Integrations page, click on the button New Kafka integration
  • In the form that opens:
    • Integration name A name to identify the integration
    • Bootstrap servers host[:port] string (or list of 'host[:port]' strings) that the consumer should contact to bootstrap initial cluster metadata. This does not have to be the full node list. It just needs to have at least one broker that will respond to a Metadata API Request. Default port is 9092. If no servers are specified, will default to localhost:9092.
    • Security protocol Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.
    • SASL mechanism Authentication mechanism when security_protocol is configured for SASL_PLAINTEXT or SASL_SSL. Valid values are: PLAIN, GSSAPI, OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-512.
    • SASL username username for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.
    • SASL password password for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.
    • Client ID a name for this client. This string is passed in each request to servers and can be used to identify specific server-side log entries that correspond to this client. Also submitted to GroupCoordinator for logging with respect to consumer group administration.

Step 2 - Connect the integration with a cluster

  • In the Integrations page, click on the button Connect to cluster below the Kafka integration you want to connect to a cluster
  • Select a Cluster to connect the integration to
  • Set any Kafka Settings by key/value (optional)

Step 3 - Connect the integration with a table

  • In the Integrations page, click on the button Connect to table below the Kafka integration you want to connect to a table
  • Select a Cluster, Database and Table
  • Use the columns picker to choose/edit/add columns to match the data coming from Kafka
  • Select the topics to subscribe to
  • In the next page:
    • Queue name The name of the Kafka table which gets initialized with the configuration and connects to the server
    • Select query The materialized view related to the queue name
    • Group name The group of Kafka consumers. Using the same group name avoids duplicated messages.
    • Format The format in which the data will be streamed to Kafka
    • Row delimiter The row delimiter character (eg. \n for CSV)
    • Consumers The amount of consumers threads that subscribe to the Kafka events (the number of consumers cannot exceed the total topic partitions)
  • Advanced settings (optional):
    • Max block size The maximum amount of messages per batch
    • Skip broken messages The fault tolerance for invalid messages
    • Commit every batch The amount of batches to handle before committing them
    • Thread per consumer The amount of threads to spawn per each consumer