News
Kafka" Streams error handling, Part 2: How to catch and fix bad records with KIP-1034 and LLMs
2+ hour, 54+ min ago (1122+ words) June 15, 2026 | By Paul Brebner This "Apache Kafka Streams news processing application with KIP-1034" blog series is split into two parts. Part 1 focused on the architecture: the data model, error scenarios, and the Kafka Streams topology that connects producers, DLQs, and…...
How to build an Apache Kafka" Streams application with KIP-1034 Dead Letter Queues and LLM repair, part 1: Introduction and streams topology
2+ hour, 54+ min ago (995+ words) June 15, 2026 | By Paul Brebner In this two-part Kafka Streams tutorial, we build a Kafka Streams application that processes streaming news data in real time. The example we'll use throughout is a simple news aggregation service. It collects articles from multiple…...
What are Asynchronous Data Inserts in Click House
6+ day, 2+ hour ago (545+ words) June 09, 2026 | By Vikas Kumar Asynchronous data inserts in Click House are a server-side batching mechanism that buffers incoming data in memory before writing it to disk. This approach allows Click House to handle high-throughput ingestion smoothly and predictably under heavy…...
How to create a hybrid search pipeline in Open Search
1+ week, 4+ day ago (596+ words) June 04, 2026 | By Kassian Wren Hybrid search in Open Search is a retrieval method that combines multiple search techniques, such as keyword matching and semantic vector search, into a single, unified result set. The solution? Hybrid search: where multiple retrieval methods…...
Kafka" to Iceberg: Build a queryable data lake with Click House"
1+ week, 5+ day ago (1452+ words) June 03, 2026 | By Walt Ribeiro That's Click House below querying an Iceberg table on S3 within 0. 31 seconds to read metadata and return the first rows. No Spark job, no data movement, and no separate warehouse layer to manage. By the end of…...
Apache Iceberg explained: A better table format for modern data lakes
1+ week, 6+ day ago (983+ words) June 02, 2026 | By Walt Ribeiro Data lakes had a reputation problem. The promise was compelling: dump all your data into cheap object storage'S3, GCS, Azure Blob'and query it whenever you need. The reality was a mess of stale partitions, schema drift,…...
When (and when not) to use Apache Kafka" Diskless Topics
1+ mon, 1+ day ago (1420+ words) May 14, 2026 | By Paul Brebner I recently wrote a Visual Guide to Apache Kafka Diskless Topics, which introduces the main ideas behind Kafka Diskless Topics and links to the relevant Kafka KIPs. At present, the only accepted KIP is the high-level…...
Kafka Sink Connectors explained: Bridging your data pipeline
1+ mon, 1+ week ago (754+ words) May 06, 2026 | By Walt Ribeiro A "sink connector" sounds simple'until you realize different systems use the term from different perspectives. In Apache Kafka Connect, a sink connector means data flowing out of Kafka into another system. In Click House, the database…...
Why regular Kafka" client upgrades matter: Lessons from the April 2026 CVEs
1+ mon, 1+ week ago (394+ words) May 04, 2026 | By Varun Ghai Keeping Kafka brokers and, just as importantly, Kafka clients up to date is one of the simplest and most effective ways to improve reliability, security, and correctness over time. New Kafka releases routinely include performance improvements,…...
Understanding Open Search" vector field type'Part 2: sparse_vector
1+ mon, 2+ week ago (648+ words) April 27, 2026 | By Ramya Ravi In part 1, we explored how knn_vector field type enables semantic search by retrieving results based on meaning rather than exact matches. When building real-world AI applications such as e-commerce search, enterprise search, and conversational systems, you will…...