Why add RedPanda/Kafka over using async insert? https://clickhouse.com/docs/opti...

Callicles · 2025-10-21T00:53:28 1761008008

Hey,

We went from the get go to that infrastructure for multiple reasons in the first place:

* Having a durable buffer before ensures if you have big spikes that gets eaten by the buffer, not OLAP which when it is powering your online dashboard you want to keep responsive. Clickhouse cloud now has compute/compute that addresses that but open source users' don't.

* When we shipped this for the first time, clickhouse did not have the async buffering in place, so not doing some kind of buffered inserts was forwned upon. * As oatsandsugar mentioned, since them we also shipped direct insert where you don't need a kafka buffer if you don't want it

* From an architecture standpoint, with that architecture you can have multiple consumers

* Finally, having kafka enables having streaming function written in your favorite language vs using SQL. Definitely will be less performance to task ratio, but depending on the task might be faster to setup or even you can do things you couldn't directly in the database.

Disclaimer I am the CTO at Fiveonefour

hodgesrm · 2025-10-21T01:43:13 1761010993

> Clickhouse cloud now has compute/compute that addresses that but open source users' don't.

Altinity is addressing this with Project Antalya builds. We have extended open source ClickHouse with stateless swarm clusters to scale queries on shared Iceberg tables.

Disclaimer: CEO of Altinity

bonobocop · 2025-10-21T19:47:23 1761076043

The durability and transformation reasons are definitely more compelling, but the article doesn’t mention those reasons.

It’s mainly focused on the insert batching which is why I was drawing attention to async_insert.

I think it’s worth highlighting the incremental transformation that CH can do via the materialised views too. That can often replace the need for a full blown streaming transformation pipelines too.

IMO, I think you can get a surprising distance with “just” a ClickHouse instance these days. I’d definitely be interested in articles that talk about where that threshold is no longer met!

maxjustus · 2025-10-21T15:33:22 1761060802

Nothing stopping an OSS user from pointing inserts at one or more write focused replicas and user facing queries at read focused replicas!

olavgg · 2025-10-20T15:41:42 1760974902

The biggest reason is that you may also have other consumers than just Clickhouse.

bonobocop · 2025-10-20T17:36:54 1760981814

Sure, but the article doesn’t talk about that, it seemed to be focused on CH alone, in which case async insert is much fewer technical tokens.

If you need to ensure that you have super durable writes, you can consider, but I really think it’s not something you need to reach for at first glance

oatsandsugar · 2025-10-21T00:30:47 1761006647

Author here: commented here about how you can use async inserts if that's your preferred ingest method (we recommend that for batch).

https://news.ycombinator.com/item?id=45651098

One of the reasons we streaming ingests is because we often modify the schema of the data in stream. Usually to conform w ClickHouse best practices that aren't adhered to in the source data (restrictive types, denormalization, default not nullable, etc).