Skip to content

Latest commit

 

History

History
194 lines (143 loc) · 20.8 KB

kafka.md

File metadata and controls

194 lines (143 loc) · 20.8 KB

drawing

This blog contrasts and compares transactional and message delivery behavior of Kafka with the converged Oracle DB and Oracle Transactional Event Queues/AQ

What makes Kafka so Fast? A Deep Dive into Kafka Storage Internals.

This is a self-contained demo using Materialize to process data IoT devices data directly from a PostgreSQL server.

In my first few months learning Apache Kafka, I drew up a blog post on the fundamental concepts behind implementing it.

This post is about issues, misunderstandings and sometimes heroic solutions from our experience of using Kafka as the main data exchange platform.

In this article, we will look at how to execute a scheduled task in Keycloak on startup using a Kafka consumer as an example.

Today is a big day for Kuma! Kuma 1.0 is now generally available with over 70 features and improvements ready to use and deploy in production to create modern distributed service meshes for every application running on multiple clusters, clouds, including Kubernetes and VM-based workloads.

In this tutorial, we are going to build a web application using AdonisJS and integrate it with Materialize to create a real-time dashboard

Specifics and complications of creating a high-load service using .NET and Kafka.

We will build a simple dashboard app that displays data from a Deno Web Socket server.

In a world where data is king, Kafka is a valuable tool for developers and data engineers to learn.

I’ve assisted several large clients in building a microservices-style architecture using Kafka as a messaging backbone, having a reasonably good understanding of its abilities and the use cases that really bring them out. But I’m not a Kafka apologist by any stretch; any technology that has gone through such a rapid adoption curve is bound to polarise its audience and rub certain developers up a wrong way, and Kafka is no exception. Like anything else, you need to invest a significant amount of time in getting across Kafka and event streaming in general, before you become fully proficient and can harness its might. And be prepared to face one or two frustrations, to put it mildly, along the way.

Building an enterprise data warehouse can be either relatively straightforward or very sophisticated. It depends on many factors, such as the conceptual data model complexity and the variety of source systems. In many cases, applying the Change Data Capture (CDC) approach can make the data integration simpler. Fortunately, there are plenty of CDC tools available in the market, many of which are easy-to-use and affordable, while others are cumbersome and expensive (for what it is).

Prep for an Apache Kafka interview by reading this questions! Aimed at juniors.

Kafka version 0.9v and above provide the capability to store the topic offsets on the broker directly instead of relying on the Zookeeper.

Auto-generation of documentation for Event-driven architecture

If you’re an architect or developer looking at event-driven architectures, stream processing might be just what you need to make your app faster, more scalable, and more decoupled.

Now, you can use Cube to build data modeling, caching, and access control layers on top of streaming SQL, just as with cloud data warehouses.

Logs are everywhere in software development. Without them there’d be no relational databases, git version control, or most analytics platforms.

Lambda architecture has 3 components, a) Speed layer, which is the streaming data layer or real time data layer, b) serving layer, which is the database layer, which is derived by aggregating data from speed layer, and c) batch layer, which is the set of computations which are perfomed on large sets of data, typically stored in a distributed file system. In this post i will be talking about how to implement the speed layer, by visualizing real time taxi data. Post that, the visualization will allow us to make some real time business decisions. Code for this article can be found here.

This blog provides an overview around the two fundamental concepts in Apache Kafka: Topics and Partitions. While developing and scaling our Anomalia Machina application we have discovered that distributed applications using Kafka and Cassandra clusters require careful tuning to achieve close to linear scalability, and critical variables included the number of Kafka topics and partitions. In this blog, we test that theory and answer questions like “What impact does increasing partitions have on throughput?” and “Is there an optimal number of partitions for a cluster to maximize write throughput?” And more!

Kafka itself comes with command line tools that can do all the administration tasks, but those tools aren’t very convenient because they are not integrated into one tool and you need to run a different tool for different tasks. Moreover, it is getting difficult to work with them when your clusters grow large or when you have several clusters.

Deploying Kafka on Kubernetes is a low-effort approach to setting up an event-driven architecture to support your API ecosystem in the cloud.

Designing a data pipeline comes with its own set of problems. Take lambda architecture for example. In the batch layer, if data somewhere in the past is incorrect, you’d have to run the computation function on the whole (possibly terabytes large) dataset, the result of which would be absorbed in serving layer and are reflected.

Real-time analytic systems use data processing frameworks, including Apache Kafka and Apache Spark. Learn more here!

In this part i would be talking about the batch layer of the Lambda Architecture. Batch layer is computed by applying a function to the whole historical dataset, to answer some high level questions which cannot be answered by either speed layer or serving layer. The computations typically take hours or days to run, and the results are stored usually in a distributed file system (although this is not a requirement). For example, the queries that might need to be answered would range from the beginning of the dataset to now, in our case, till date how many cabs have served how many passengers, or what is the total distance driven by all the cabs. In this article i would try to answer questions like these based on the dataset that i have. The code for the article can be found here.

Technical design. Because one of the most common use cases of the new databases is storing data that is generated by high-throughput sources, it is important that the store engine is able to handle write-intensive workloads, all while offering acceptable read performance. RocksDB implements what is known in the database literature as a log-structured merge tree aka LSM tree.

Microservices, Machine Learning & Big Data are making waves among organizations. Curiously they all share the same biggest concern: data.

Running systems in production involve requirements for high availability, resilience and recovery from failure. When running cloud-native applications this becomes even more critical, as the base assumption in such environments is that compute nodes will suffer outages, Kubernetes nodes will go down and microservices instances are likely to fail, yet the service is expected to remain up and running.

Kafka & Spark integration may be tricky when Kafka is protected by Kerberos. Here is the guide on how to access Kafka with Spark and Spark Streaming.

Having worked with Kafka for more than two years now, there are two configs whose interaction I've seen be ubiquitously confused.

This post was co-written with Ben Wilcock, Product and Technical Marketing Manager for Spring at Pivotal.

Introduction

Let’s imagine we have XML data on a queue in IBM MQ, and we want to ingest it into Kafka to then use downstream, perhaps in an application or maybe to stream to a NoSQL store like MongoDB.

Comparing Enterprise messaging and event streaming across different dimensions to see how they excel at solving different but related messaging problems

My team has recently successfully decoupled one of the critical business domains of the company. The initial integration had such a tough deadline that the only way to meet it was to add code to the monolith. And… The feature that went from conception to production in three weeks ended up taking almost one year to decouple.

In this part i would be talking about the serving layer of the Lambda Architecture. Serving layer is derived either by performing computation on batch data to arrive at a view that is mid way from speed layer and batch layer

This article was originally posted to the Confluent blog.

Intro

In this tutorial, we'll walk you through how to use Docker, Kafka, and Kubernetes to deploy a simple application.

In this article we will cover the core concepts of Kafka and also will touch upon a few of the advanced topics.

In this blog by Paul Brebner, Instaclustr's tech evangelist explains the Apache ZooKeeper using the famous dining philosophers problem.

Here are five tips on how Kafka works and how you can get started with Apache Kafka.

Event sourcing, eventual consistency, microservices, CQRS... These are quickly becoming household names in mainstream application development. But do you know what makes them tick? What are the basic building blocks required to assemble complex, business-centric applications from fine-grained services without turning the lot into a big ball of mud?