Get the latest tech news

What If We Could Rebuild Kafka from Scratch?

The last few days I spent some time digging into the recently announced KIP-1150 ("Diskless Kafka"), as well AutoMQ’s Kafka fork, tightly integrating Apache Kafka and object storage, such as S3. Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration. This got me thinking, if we were to start all over and develop a durable cloud-native event log from scratch—Kafka.next if you will—which traits and characteristics would be desirable for this to have? Separating storage and compute and object store support would be table stakes, but what else should be there? Having used Kafka for many years for building event-driven applications as well as for running realtime ETL and change data capture pipelines, here’s my personal wishlist:

Following the example set by WarpStream, these projects aim to substantially improve the experience of using Kafka in cloud environments, providing better elasticity, drastically reducing cost, and paving the way towards native lakehouse integration. For certain use cases it would be helpful to be able to guarantee that derived data views have been updated when a produce request gets acknowledged, allowing Kafka to act as a log for a true database with strong read-your-own-writes semantics. Spinning up a new customer-specific environment should be a very cheap operation, happening instantaneously; the workloads of individual tenants should be strictly isolated, not interfering with each other in regards to access control and security, resource utilization, metering etc.

Get the Android app

Or read this on Hacker News