Topics

A particular stream of data. Like a table in a database without all the constraints. You can have as many topics as you want. A topic is identified by its name.

Any kind of message format, JSON, Avro , etc.

The sequence of messages is called a Data Stream. You cannot query topics directly through Kafka, you will send data using Producers and read data through Consumers.

Topics are Immutable.
Data in a topic is kept only for a limited time (default is one week but is configurable.)

Once a topic is created, the Message Serializer /Consumer Deserializer type must not change during a topic lifecycle (create a new topic instead).

Topics are made up of Partitions.

References

Quote

Messages in Kafka are categorized into topics. The closest analogies for a topic are a database table or a folder in a filesystem (Page 5) #✂️

Quote

The term stream is often used when discussing data within systems like Kafka. Most often, a stream is considered to be a single topic of data, regardless of the number of partitions. This represents a single stream of data moving from the producers to the consumers. (Page 6) #✂️

Flashcards

Topics::: A particular stream of data.

Topics can contain these message formats::: JSON, AVRO, Binary, anything

To query a topic::: You can't do this directly, you send data using Producers and read data through Consumers

A sequence of messages::: Data stream

Topics are Immutable and cannot change
The default retention period for a topic is:: One week

A topic's messages serialization type must not:: change during a topic's lifecycle

If a topic has a replication factor of 3...:: Each partition will live on 3 different brokers

There are 3 brokers. A kafka topic has a replication factor of 3 and min.insync.replicas setting of 1. What is the maximum number of brokers that can be down so that a producer with acks=all can still produce to the topic?:: Two brokers can go down, and one replica will still be able to receive and serve data

A client connects to a broker in the cluster and sends a fetch request for a partition in a topic. It gets an exception NotLeaderForPartitionException in the response. How does client handle this situation?:: Send metadata request to the same broker for the topic and select the broker hosting the leader replica. In case the consumer has the wrong leader of a partition, it will issue a metadata request. The Metadata request can be handled by any node, so clients know afterwards which broker are the designated leader for the topic partitions. Produce and consume requests can only be sent to the node hosting partition leader.

Your topic is log compacted and you are sending a message with the key K and value null. What will happen?:: The broker will delete all messages with the key K upon cleanup. Sending a message with the null value is called a tombstone in Kafka and will ensure the log compacted topic does not contain any messages with the key K upon compaction

When auto.create.topics.enable is set to true in Kafka configuration, what are the circumstances under which a Kafka broker automatically creates a topic? (select three)

Compaction is enabled for a topic in Kafka by setting log.cleanup.policy=compact. What is true about log compaction?:: After cleanup, only one message per key is retained with the latest value. Log compaction retains at least the last known value for each record key for a single topic partition. All compacted log offsets remain valid, even if record at offset has been compacted away as a consumer will get the next highest offset.

What is true about partitions? (select two)

What is true about replicas?

How will you set the retention for the topic named "my-topic" to 1 hour?:: Set the topic config retention.ms to 3600000. retention.ms can be configured at topic level while creating topic or by altering topic. It shouldn't be set at the broker level (log.retention.ms) as this would impact all the topics in the cluster, not just the one we are interested in

There are 3 brokers in the cluster. You want to create a topic with a single partition that is resilient to one broker failure and one broker maintenance. What is the replication factor will you specify while creating the topic?:: 3. 1 is not possible as it doesn't provide resilience to failure, 2 is not enough as if we take a broker down for maintenance, we cannot tolerate a broker failure, and 6 is impossible as we only have 3 brokers (RF cannot be greater than the number of brokers). Here the correct answer is 3

By default, which replica will be elected as a partition leader? (select two)

What physically is Kafka partitions made of?:: One file and 2 indexes per segment. Kafka partitions are made of segments (usually each segment is 1GB), and each segment has two corresponding indexes (offset index and time index)

Which of the following statements are true regarding the number of partitions of a topic?