Skip to content

AutoMQ x Databend: Cloud Data Warehouse built in Rust with Kafka Ecosystem

lyx edited this page Jan 17, 2025 · 1 revision

Databend is a state-of-the-art, cloud-native data warehouse developed using Rust, tailored for cloud architectures and leveraging object storage. It provides enterprises with a robust big data analytics platform featuring an integrated lakehouse architecture and a separation of compute and storage resources.

This article outlines the steps to import data from AutoMQ into Databend using bend-ingest-kafka.

Environment Setup

Prepare Databend Cloud and Test Data

Initially, navigate to Databend Cloud to launch a Warehouse, and proceed to create a database and a test table in the worksheet.

create database automq_db;
create table users (
    id bigint NOT NULL,
    name string NOT NULL,
    ts timestamp,
    status string
)

Prepare AutoMQ and Test Data

Follow the Stand-alone Deployment guide to set up AutoMQ, ensuring there is network connectivity between AutoMQ and Databend.

Promptly create a topic named example_topic in AutoMQ and populate it with test JSON data by following these instructions.

Create a Topic

To set up a topic using Apache Kafka® command-line tools, first ensure that you have access to a Kafka environment and the Kafka service is active. Here's an example command to create a topic:

./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092  --partitions 1 --replication-factor 1

When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.

Once the topic is created, use the command below to confirm that the topic was successfully established.

./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092

Generate Test Data

Create a JSON formatted test data, matching the previous table.

{
  "id": 1,
  "name": "测试用户",
  "timestamp": "2023-11-10T12:00:00",
  "status": "active"
}

Write Test Data

Write test data to a topic named `example_topic` using Kafka's command-line tools or through programming. Here is an example using the command-line tool:

echo '{"id": 1, "name": "测试用户", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic

When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.

Use the following command to view the data just written to the topic:

sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning

Create a bend-ingest-databend job

bend-ingest-kafka is capable of monitoring Kafka and batching data into a Databend Table. Once bend-ingest-kafka is deployed, the data import job can be started.

bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s

When executing the command, replace `kafka-bootstrap-servers` with the actual Kafka server address being used.

Parameter Description

databend-dsn

The DSN for connecting to the warehouse, provided by Databend Cloud, is detailed in this document.

batch-size

bend-ingest-kafka accumulates data up to the specified batch-size before initiating a data synchronization.

Validate Data Import

Access the Databend Cloud worksheet and execute a query on the automq_db.users table to verify the synchronization of data from AutoMQ to the Databend Table.

AutoMQ Wiki Key Pages

What is automq

Getting started

Architecture

Deployment

Migration

Observability

Integrations

Releases

Benchmarks

Reference

Articles

Clone this wiki locally