Kafka Cluster Setup
Kafka Cluster Size Discussions
- 1 Broker only good for dev
- 3 Brokers - Good for most cases
- 100 Brokers - Zookeeper probably under pressure, will likely need more zookeeper instances, likely have a full time team for managing cluster
- Only scale up when a bottleneck is reached
Kafka Configuration
-
Configuring Kafka in production is an 'art'
-
It requires really good understanding of:
- Operating Systems Architecture
- Servers Architecture
- Distributed Computing
- CPU operations
- Network perofrmance Disk I/O
- RAM and Heap size
- Page cache
- Kafka and Zookeeper
-
There are over 140 configuration parameters available for Kafka
- Only a few parameters are needed to get started
- Importance is classified between mandatory, high, medium, and low
-
You will never get an optimal configuration the first time
-
Configuration evolves over time
-
Server Basics
broker.id=1The id of the broker, must be unique for each brokeradvertised.listeners=PLAINTEXT://kafka1:9092change your.host.name by your machine's IP or hostnamedelete.topic.enable=trueswithc the enable topic deletion or not, default value is false
-
Log Basics
log.dirs=/data/kafkaa comma separated list of directories under which to store log filesnum.partitions=8The default number of log partitions per topicdefault.replication.factor=3we will have 3 brokers so the default replication factor should be 2 or 3min.insync.replicas=2number of ISR to have in order to minimize data loss
-
Log Retention Policy
log.retention.hours=168The minimum age of a log file to be eligible for deletion due to agelog.segment.bytes=123135245The maximum size of a log segment file. When this size is reached a new log segment will be createdlog.retention.check.interval.ms=30000The interval at which log segments are checked to see if they can be deleted according to the retention policies
-
Zookeeper
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafkaZookeeper connection string withchrootstring to the root directory for all kafka znodeszookeeper.connection.timeout.ms=6000Timeout in ms for connecting to zookeeper
-
Other
auto.create.topics.enable=trueSet to false in production
Hands On Single Kafka Broker Setup:
- Augment the file handle limits
Advertised Listeners Setting
The most important setting of Kafka
Kafka Broker 1
Public IP: 34.56.78.90
Private IP: 172.31.9.1
ADV_HOST=172.31.9.1
Basically the broker designates exactly what ip/hostname the broker wants you to use. If you designate that the advertised host is localhost or some other private IP that the client doesn't have access to, then the kafka client will fail to access the broker
The IPs need to be static.