In-Sync Replicas Management

In-sync replicas (ISR) are crucial for Kafka's reliability guarantees. A replica is considered in-sync if it:

Key Configurations

  1. zookeeper.session.timeout.ms

    • Time before broker considered dead if no heartbeat
    • Default: 18 seconds (since 2.5.0)
    • Higher values improve stability in cloud environments
  2. replica.lag.time.max.ms

    • Maximum time a follower can lag behind leader
    • Default: 30 seconds (since 2.5.0)
    • Affects consumer latency
  3. min.insync.replicas

    • Minimum replicas that must acknowledge writes
    • Works with acks=all for strongest durability
    • Partition becomes read-only if not met

References

Flashcards

What makes a replica considered "in-sync"?:: Active ZooKeeper session, regular fetches from leader, and caught up to recent messages

What happens when available replicas fall below min.insync.replicas?:: Partition becomes read-only for producers (but consumers can still read)

What is the purpose of replica.lag.time.max.ms?:: Defines maximum time a follower can lag behind leader before being removed from ISR