Log Compaction and Retention
Log Cleanup Policies
-
Delete Policy (default for user topics)
- Based on message age (default 1 week)
- Based on log size (default infinite)
- Controlled by log.cleanup.policy=delete
-
Compact Policy (default for __consumer_offsets)
- Deletes based on message keys
- Maintains latest value per key
- Controlled by log.cleanup.policy=compact
Retention Configuration
-
Time-based Retention
- log.retention.hours (default 168)
- log.retention.minutes
- log.retention.ms
- Smallest unit takes precedence
-
Size-based Retention
- log.retention.bytes
- Controls max size per partition
- Default is -1 (unlimited)
Log Compaction Behavior
-
Timing
- Evaluated when segments close
- Polls every 15 seconds (log.cleaner.backoff.ms)
- Triggered by dirty ratio threshold
-
Important Notes
- Preserves message order
- Doesn't prevent duplicates in real-time
- Offsets remain immutable
- Deleted records visible for delete.retention.ms
References
Flashcards
How often is log compaction evaluated?:: Every time a segment is closed and if enough data is "dirty"
What are the two log cleanup policies in Kafka?:: delete (age/size based) and compact (key based)
Which retention setting takes precedence when multiple are configured?:: The smallest unit (ms over minutes over hours)