Kafka
Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
note
You need to have a running instance of Apache Kafka before you can integrate it with Gigapipe.
Once you have a running Apache Kafka server, follow these instructions:
- Login to your Gigapipe account
- Go to the Integrations page
Step 1 - Connect your Gigapipe account with an instance of Apache Kafka
- In the Integrations page, click on the button New Kafka integration
- In the form that opens:
Integration name
A name to identify the integrationBootstrap servers
host[:port] string (or list of 'host[:port]' strings) that the consumer should contact to bootstrap initial cluster metadata. This does not have to be the full node list. It just needs to have at least one broker that will respond to a Metadata API Request. Default port is 9092. If no servers are specified, will default to localhost:9092.Security protocol
Protocol used to communicate with brokers. Valid values are: PLAINTEXT, SSL, SASL_PLAINTEXT, SASL_SSL.SASL mechanism
Authentication mechanism when security_protocol is configured for SASL_PLAINTEXT or SASL_SSL. Valid values are: PLAIN, GSSAPI, OAUTHBEARER, SCRAM-SHA-256, SCRAM-SHA-512.SASL username
username for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.SASL password
password for sasl PLAIN and SCRAM authentication. Required if sasl_mechanism is PLAIN or one of the SCRAM mechanisms.Client ID
a name for this client. This string is passed in each request to servers and can be used to identify specific server-side log entries that correspond to this client. Also submitted to GroupCoordinator for logging with respect to consumer group administration.
Step 2 - Connect the integration with a cluster
- In the Integrations page, click on the button Connect to cluster below the Kafka integration you want to connect to a cluster
- Select a Cluster to connect the integration to
- Set any Kafka Settings by key/value (optional)
Step 3 - Connect the integration with a table
- In the Integrations page, click on the button Connect to table below the Kafka integration you want to connect to a table
- Select a Cluster, Database and Table
- Use the columns picker to choose/edit/add columns to match the data coming from Kafka
- Select the topics to subscribe to
- In the next page:
Queue name
The name of the Kafka table which gets initialized with the configuration and connects to the serverSelect query
The materialized view related to the queue nameGroup name
The group of Kafka consumers. Using the same group name avoids duplicated messages.Format
The format in which the data will be streamed to KafkaRow delimiter
The row delimiter character (eg.\n
for CSV)Consumers
The amount of consumers threads that subscribe to the Kafka events (the number of consumers cannot exceed the total topic partitions)
- Advanced settings (optional):
Max block size
The maximum amount of messages per batchSkip broken messages
The fault tolerance for invalid messagesCommit every batch
The amount of batches to handle before committing themThread per consumer
The amount of threads to spawn per each consumer