Topics
A particular stream of data. Like a table in a database without all the constraints. You can have as many topics as you want. A topic is identified by its name.
Any kind of message format, JSON, Avro , etc.
The sequence of messages is called a Data Stream. You cannot query topics directly through Kafka, you will send data using Producers and read data through Consumers.
Topics are Immutable.
Data in a topic is kept only for a limited time (default is one week but is configurable.)
Once a topic is created, the Message Serializer /Consumer Deserializer type must not change during a topic lifecycle (create a new topic instead).
Topics are made up of Partitions.
References
Flashcards
Topics::: A particular stream of data.
Topics can contain these message formats::: JSON, AVRO, Binary, anything
To query a topic::: You can't do this directly, you send data using Producers and read data through Consumers
A sequence of messages::: Data stream
Topics are Immutable and cannot change
The default retention period for a topic is:: One week
A topic's messages serialization type must not:: change during a topic's lifecycle
If a topic has a replication factor of 3...:: Each partition will live on 3 different brokers
There are 3 brokers. A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1. What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?:: Two brokers can go down, and one replica will still be able to receive and serve data
A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does client handle this situation?:: Send metadata request to the same broker for the topic and select the broker hosting the leader replica. In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.
Your topic is log compacted and you are sending a message with the key K and value null. What will happen?:: The broker will delete all messages with the key K upon cleanup. Sending a message with the null value is called a tombstone in Kafka and will ensure the log compacted topic does not contain any messages with the key K upon compaction
When auto.create.topics.enable is set to true in Kafka configuration, what are the circumstances under which a Kafka broker automatically creates a topic? (select three)
- Producer sends message to a topic
- Client requests metadata for a topic
- Consumer reads message from a topic
- Client alters number of partitions of a topic
? - Producer sends message to a topic
- Client requests metadata for a topic
- Consumer reads message from a topic
A kafka broker automatically creates a topic under the following circumstances: - When a producer starts writing messages to the topic - When a consumer starts reading messages from the topic - When any client requests metadata for the topic
Compaction is enabled for a topic in Kafka by setting log.cleanup.policy=compact. What is true about log compaction?:: After cleanup, only one message per key is retained with the latest value. Log compaction retains at least the last known value for each record key for a single topic partition. All compacted log offsets remain valid, even if record at offset has been compacted away as a consumer will get the next highest offset.
What is true about partitions? (select two)
- A broker can have a partition and its replica on its disk
- You cannot have more partitions than the number of brokers in your cluster
- Only out of sync replicas are replicas, the remaining partitions that are in sync are also leader
- A partition has one replica that is a leader, while the other replicas are followers
- A broker can have different partitions numbers for the same topic on its disk
? - A partition has one replica that is a leader, while the other replicas are followers
- A broker can have different partitions numbers for the same topic on its disk
Only one of the replicas is elected as partition leader. And a broker can definitely hold many partitions from the same topic on its disk, try creating a topic with 12 partitions on one broker!
What is true about replicas?
- Produce and consume requests are load-balanced between Leader and Follower replicas
- Follower replica handles all consume requests
- Produce requests can be done to the replicas that are followers
- Leader replica handles all produce and consume requests
?
Leader replica handles all produce and consume requests. Replicas are passive - they don't handle produce or consume request. Produce and consume requests get sent to the node hosting partition leader.
How will you set the retention for the topic named "my-topic" to 1 hour?:: Set the topic config retention.ms to 3600000. retention.ms can be configured at topic level while creating topic or by altering topic. It shouldn't be set at the broker level (log.retention.ms) as this would impact all the topics in the cluster, not just the one we are interested in
There are 3 brokers in the cluster. You want to create a topic with a single partition that is resilient to one broker failure and one broker maintenance. What is the replication factor will you specify while creating the topic?:: 3. 1 is not possible as it doesn't provide resilience to failure, 2 is not enough as if we take a broker down for maintenance, we cannot tolerate a broker failure, and 6 is impossible as we only have 3 brokers (RF cannot be greater than the number of brokers). Here the correct answer is 3
By default, which replica will be elected as a partition leader? (select two)
- Preferred leader broker if it is in-sync and auto.leader.rebalance.enable=false
- Any of the replicas
- Preferred leader broker if it is in-sync and auto.leader.rebalance.enable=true
- An in-sync replica
? - Preferred leader broker if it is in-sync and auto.leader.rebalance.enable=true
- An in-sync replica
Preferred leader is a broker that was leader when topic was created. It is preferred because when partitions are first created, the leaders are balanced between brokers. Otherwise, any of the in-sync replicas (ISR) will be elected leader, as long as unclean.leader.election=false (by default)'
What physically is Kafka partitions made of?:: One file and 2 indexes per segment. Kafka partitions are made of segments (usually each segment is 1GB), and each segment has two corresponding indexes (offset index and time index)
Which of the following statements are true regarding the number of partitions of a topic?
- We can add partitions in a topic using the kafka-topics.sh command
- We can remove partitions in a topic using the kafka-topics.sh command
- The number of partitions in a topic cannot be altered
- We can add partitions in a topic by adding a broker to the cluster
- We can remove partitions in a topic by removing a broker
?
We can only add partitions to an existing topic, and it must be done using the kafka-topics.sh command