Designing Event-Driven Systems

Metadata
- Document Note: The book "Designing Event-Driven Systems" discusses the use of Kafka, an event streaming platform, for building event-driven systems. The book covers topics such as managing data evolution with schemas, maintaining strong ordering guarantees, and implementing streaming services with Kafka Streams and KSQL. It also explains the use of event sourcing and CQRS patterns to help systems scale and be less prone to corruption. The book emphasizes the importance of a shared source of truth and loose coupling in designing event-driven systems.
- URL: https://readwise.io/reader/document_raw_content/18412683
Highlights
But the services model made
interaction a first-class entity. Suddenly your users weren’t just customers or businesspeople; they were other applications, and they really cared that your service was reliable. So applications became platforms, and building platforms is hard. (View Highlight) #✂️
Complex dependencies led to instability, versioning issues caused painful lockstep releases, and early on, it wasn’t clear that the new architecture was actually an improvement. (View Highlight) #✂️
Enterprise service buses (ESBs), for example, have vocal detractors and traditional messaging systems have a number of issues of their own. They are often used to move data around an organization, but the absence of any notion of history limits their value. So, even though recent events typically have more value than old ones, business operations still need historical data—whether it’s users wanting to query their account history, some service needing a list of customers, or analytics that need to be run for a management report. (View Highlight) #✂️
Replayable logs decouple services from one another, much like a messaging system does, but they also provide a central point of storage that is fault-tolerant and scalable—a shared source of truth that any application can fall back to. (View Highlight) #✂️
A shared source of truth turns out to be a surprisingly useful thing. Microservi‐ ces, for example, don’t share their databases with one another (referred to as the IntegrationDatabase antipattern). There is a good reason for this: databases have very rich APIs that are wonderfully useful on their own, but when widely shared they make it hard to work out if and how one application is going to affect oth‐ ers, be it data couplings, contention, or load. But the business facts that services do choose to share are the most important facts of all. (View Highlight) #✂️
a replayable log provides a far more suitable place to hold this kind of data because (somewhat counterintuitively) you can’t query it! It is purely about storing data and pushing it to somewhere new. This idea of pure data movement is important, because data on the outside—the data services share—is the most tightly coupled of all, and the more services an ecosystem has, the more tightly coupled this data gets (View Highlight) #✂️
eplayable log–based approach has two primary benefits. First, it makes it easy to react to events that are happening now, with a toolset specifically designed for manipulating them. Second, it provides a central repository that can push whole datasets to wherever they may be needed. (View Highlight) #✂️
Making message-at-a-time network calls isn’t a particularly good idea when you’re handling a high-throughput event stream. For this reason stream processors write data locally (so writes and reads are fast) and back those writes
up to Kafka (View Highlight) #✂️
A broker is a separate piece of infrastructure that broadcasts messages to any programs that are interested in them, as well as storing them for as long as is needed. So it’s perfect for streaming or fire-and-forget messaging. (View Highlight) #✂️
ESBs are criticized in some circles. This criticism arises from the way the technology has been built up over the last 15 years, particularly where ESBs are controlled by central teams that dictate schemas, message flows, validation, and even transformation. In practice centralized approaches like this can constrain an organization, making it hard for individual applications and services to evolve at their own pace. (View Highlight) #✂️
So Kafka may look a little like an ESB, but as we’ll see throughout this book, it is very different. It provides a far higher level of throughput, availability, and storage, and there are hundreds of companies routing their core facts through a single Kafka cluster. Beyond that, streaming encourages services to retain control, particularly of their data, rather than providing orchestration from a single, central team or platform. (View Highlight) #✂️
A Kafka cluster is essentially a collection of files, filled with messages, spanning many different machines. Most of Kafka’s code involves tying these various individual logs together, routing messages from producers to consumers reliably, replicating for fault tolerance, and handling failure gracefully. (View Highlight) #✂️
A Kafka cluster is a distributed system, spreading data over many machines both for fault tolerance and for linear scale-out. The system is designed to handle a range of use cases, from high-throughput streaming, where only the latest mes‐
sages matter, to mission-critical use cases where messages and their relative ordering must be preserved with the same guarantees as you’d expect from a DBMS (database management system) or storage system. (View Highlight) #✂️
At the heart of the Kafka messaging system sits a partitioned, replayable log. The log-structured approach is itself a simple idea: a collection of messages, appended sequentially to a file. When a service wants to read messages from Kafka, it “seeks” to the position of the last message it read, then scans sequentially, reading messages in order while periodically recording its new position in the log (View Highlight) #✂️
In fact, when you read messages from Kafka, the server doesn’t even import them into the JVM (Java virtual machine). Data is copied directly from the disk buffer to the network
buffer (zero copy)—an opportunity afforded by the simplicity of both the contract and the underlying data structure (View Highlight) #✂️
There are a few implications to this log-structured approach. If a service has some form of outage and doesn’t read messages for a long time, the backlog won’t cause the infrastructure to slow significantly (a common problem with tra‐ ditional brokers, which have a tendency to slow down as they get full). Being log- structured also makes Kafka well suited to performing the role of an event store, for those who like to apply Event Sourcing within their services. (View Highlight) #✂️
Partitions are a fundamental concept for most distributed data systems. A partition is just a bucket that data is put into, much like buckets used to group data in a hash table. In Kafka’s terminology each log is a replica of a partition held on a different machine. (View Highlight) #✂️
There are a couple of things that need to be considered to ensure strong ordering guarantees. The first is that messages that require relative ordering need to be sent to the same partition. (Kafka provides ordering guarantees only within a partition.) This is managed for you: you supply the same key for all messages that require a relative order. So a stream of customer information updates would use the CustomerId as their partitioning key. All messages for the same customer would then be routed to the same partition, and hence be strongly ordered (View Highlight) #✂️
Sometimes key-based ordering isn’t enough, and global ordering is required. This often comes up when you’re migrating from legacy messaging systems where global ordering was an assumption of the original system’s design. To maintain global ordering, use a single partition topic. Throughput will be limited to that of a single machine, but this is typically sufficient for use cases of this type. (View Highlight) #✂️
Compacted topics work a bit like simple log-structure merge-trees (LSM trees). The topic is scanned periodically, and old messages are removed if they have been superseded (based on their key); see Figure 4-5. It’s worth noting that this is an asynchronous process, so a compacted topic may contain some superseded messages, which are waiting to be compacted away. (View Highlight) #✂️