Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
You need to have a running instance of Apache Kafka before you can integrate it with Gigapipe.
Once you have a running Apache Kafka server, follow these instructions:
Step 1 - Connect your Gigapipe account with an instance of Apache Kafka
- In the Integrations page, click on the button New Kafka integration
- In the form that opens:
Integration nameA name to identify the integration
Bootstrap servershost[:port] string (or list of 'host[:port]' strings) that the consumer should contact to bootstrap initial cluster metadata. This does not have to be the full node list. It just needs to have at least one broker that will respond to a Metadata API Request. Default port is 9092. If no servers are specified, will default to localhost:9092.
Security protocolProtocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.
SASL mechanismAuthentication mechanism when security_protocol is configured for SASL_PLAINTEXT or SASL_SSL. Valid values are: PLAIN, GSSAPI, OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-512.
SASL usernameusername for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.
SASL passwordpassword for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.
Client IDa name for this client. This string is passed in each request to servers and can be used to identify specific server-side log entries that correspond to this client. Also submitted to GroupCoordinator for logging with respect to consumer group administration.
Step 2 - Connect the integration with a cluster
- In the Integrations page, click on the button Connect to cluster below the Kafka integration you want to connect to a cluster
- Select a Cluster to connect the integration to
- Set any Kafka Settings by key/value (optional)
Step 3 - Connect the integration with a table
- In the Integrations page, click on the button Connect to table below the Kafka integration you want to connect to a table
- Select a Cluster, Database and Table
- Use the columns picker to choose/edit/add columns to match the data coming from Kafka
- Select the topics to subscribe to
- In the next page:
Queue nameThe name of the Kafka table which gets initialized with the configuration and connects to the server
Select queryThe materialized view related to the queue name
Group nameThe group of Kafka consumers. Using the same group name avoids duplicated messages.
FormatThe format in which the data will be streamed to Kafka
Row delimiterThe row delimiter character (eg.
ConsumersThe amount of consumers threads that subscribe to the Kafka events (the number of consumers cannot exceed the total topic partitions)
- Advanced settings (optional):
Max block sizeThe maximum amount of messages per batch
Skip broken messagesThe fault tolerance for invalid messages
Commit every batchThe amount of batches to handle before committing them
Thread per consumerThe amount of threads to spawn per each consumer